Double dissociation of value computations in orbitofrontal and anterior cingulate neurons

Similar documents
Supplementary materials for: Executive control processes underlying multi- item working memory

Reward prediction based on stimulus categorization in. primate lateral prefrontal cortex

Reward-Dependent Modulation of Working Memory in Lateral Prefrontal Cortex

Supplementary Figure 1. Recording sites.

The primate amygdala combines information about space and value

Supplementary Material for

Encoding of Reward and Space During a Working Memory Task in the Orbitofrontal Cortex and Anterior Cingulate Sulcus

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Supplementary Information

The individual animals, the basic design of the experiments and the electrophysiological

From Rule to Response: Neuronal Processes in the Premotor and Prefrontal Cortex

Resistance to forgetting associated with hippocampus-mediated. reactivation during new learning

SUPPLEMENTARY INFORMATION

Supplementary Information. Gauge size. midline. arcuate 10 < n < 15 5 < n < 10 1 < n < < n < 15 5 < n < 10 1 < n < 5. principal principal

Nature Neuroscience: doi: /nn Supplementary Figure 1. Task timeline for Solo and Info trials.

Neuronal Encoding of Subjective Value in Dorsal and Ventral Anterior Cingulate Cortex

Lateral Intraparietal Cortex and Reinforcement Learning during a Mixed-Strategy Game

Effects of Amygdala Lesions on Reward-Value Coding in Orbital and Medial Prefrontal Cortex

Nature Neuroscience: doi: /nn Supplementary Figure 1

Nature Neuroscience: doi: /nn Supplementary Figure 1. Confirmation that optogenetic inhibition of dopaminergic neurons affects choice

SUPPLEMENTARY INFORMATION

Distributed Coding of Actual and Hypothetical Outcomes in the Orbital and Dorsolateral Prefrontal Cortex

Nature Neuroscience: doi: /nn.4642

Supplementary Eye Field Encodes Reward Prediction Error

Nature Neuroscience: doi: /nn Supplementary Figure 1

Nature Neuroscience doi: /nn Supplementary Figure 1. Characterization of viral injections.

Supporting Information. Demonstration of effort-discounting in dlpfc

Nature Neuroscience: doi: /nn Supplementary Figure 1. Trial structure for go/no-go behavior

A Comparison of Abstract Rules in the Prefrontal Cortex, Premotor Cortex, Inferior Temporal Cortex, and Striatum

Supplementary Figure 1. Example of an amygdala neuron whose activity reflects value during the visual stimulus interval. This cell responded more

Comparison of Associative Learning- Related Signals in the Macaque Perirhinal Cortex and Hippocampus

Is model fitting necessary for model-based fmri?

Contributions of the Amygdala to Reward Expectancy and Choice Signals in Human Prefrontal Cortex

SI Appendix: Full Description of Data Analysis Methods. Results from data analyses were expressed as mean ± standard error of the mean.

Opinion This Is Your Brain on Politics

There are, in total, four free parameters. The learning rate a controls how sharply the model

Supplementary Figure 1. Reinforcement altered Training phase RTs.

Ventrolateral Prefrontal Neuronal Activity Related to Active Controlled Memory Retrieval in Nonhuman Primates

Decoding a Perceptual Decision Process across Cortex

The Frontal Lobes. Anatomy of the Frontal Lobes. Anatomy of the Frontal Lobes 3/2/2011. Portrait: Losing Frontal-Lobe Functions. Readings: KW Ch.

Neurons in the human amygdala selective for perceived emotion. Significance

Supplemental Information. Triangulating the Neural, Psychological, and Economic Bases of Guilt Aversion

Orbitofrontal cortical activity during repeated free choice

Methods to examine brain activity associated with emotional states and traits

brain valuation & behavior

Retinotopy & Phase Mapping

Distinguishing informational from value-related encoding of rewarding and punishing outcomes in the human brain

Montreal Neurological Institute and Department of Psychology, McGill University, Montreal, Quebec H3A 2B4, Canada

On Working Memory: Its organization and capacity limits

Planning activity for internally generated reward goals in monkey amygdala neurons

Supplemental Information

Reward Size Informs Repeat-Switch Decisions and Strongly Modulates the Activity of Neurons in Parietal Cortex

Human and nonhuman primates tend to live in large social

TMS Disruption of Time Encoding in Human Primary Visual Cortex Molly Bryan Beauchamp Lab

Representation of Multiple, Independent Categories in the Primate Prefrontal Cortex

Supporting Information

Supplemental Data: Capuchin Monkeys Are Sensitive to Others Welfare. Venkat R. Lakshminarayanan and Laurie R. Santos

Within-event learning contributes to value transfer in simultaneous instrumental discriminations by pigeons

Behavioral generalization

Supplemental Information: Adaptation can explain evidence for encoding of probabilistic. information in macaque inferior temporal cortex

HST.583 Functional Magnetic Resonance Imaging: Data Acquisition and Analysis Fall 2006

Nature Neuroscience: doi: /nn Supplementary Figure 1. Large-scale calcium imaging in vivo.

Replacing the frontal lobes? Having more time to think improve implicit perceptual categorization. A comment on Filoteo, Lauritzen & Maddox, 2010.

Supplementary Figure 1 Information on transgenic mouse models and their recording and optogenetic equipment. (a) 108 (b-c) (d) (e) (f) (g)

CROSSMODAL PLASTICITY IN SPECIFIC AUDITORY CORTICES UNDERLIES VISUAL COMPENSATIONS IN THE DEAF "

U.S. copyright law (title 17 of U.S. code) governs the reproduction and redistribution of copyrighted material.

Attention Response Functions: Characterizing Brain Areas Using fmri Activation during Parametric Variations of Attentional Load


Cortical Control of Movement

A Memory Model for Decision Processes in Pigeons

Neural Basis of Decision Making. Mary ET Boyle, Ph.D. Department of Cognitive Science UCSD

Connect with amygdala (emotional center) Compares expected with actual Compare expected reward/punishment with actual reward/punishment Intuitive

5th Mini-Symposium on Cognition, Decision-making and Social Function: In Memory of Kang Cheng

A model to explain the emergence of reward expectancy neurons using reinforcement learning and neural network $

Supplemental Information. Gamma and the Coordination of Spiking. Activity in Early Visual Cortex. Supplemental Information Inventory:

SUPPLEMENTARY INFORMATION Perceptual learning in a non-human primate model of artificial vision

Delayed Matching-To-Sample Test in Macaques

A model of the interaction between mood and memory

Representation of Multiple, Independent Categories in the Primate Prefrontal Cortex

Coding of Reward Risk by Orbitofrontal Neurons Is Mostly Distinct from Coding of Reward Value

Dissociating Valence of Outcome from Behavioral Control in Human Orbital and Ventral Prefrontal Cortices

Expectations, gains, and losses in the anterior cingulate cortex

SUPPLEMENTAL MATERIAL

CISC 3250 Systems Neuroscience

Dopamine in Ube3a m-/p+ mice. Online Supplemental Material

Nature Neuroscience: doi: /nn Supplementary Figure 1. Lick response during the delayed Go versus No-Go task.

Transitive inference in pigeons: Control for differential value transfer

The Neural Basis of Following Advice

Behavioural Processes

NO EVIDENCE FOR ANIMATE IMPLIED MOTION PROCESSING IN MACAQUE AREAS MT AND MST

Birds' Judgments of Number and Quantity

Nature Neuroscience: doi: /nn Supplementary Figure 1

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

Reading Words and Non-Words: A Joint fmri and Eye-Tracking Study

Experience-Dependent Sharpening of Visual Shape Selectivity in Inferior Temporal Cortex

Supporting Online Material for

dorsomedial striatum encodes net expected return, critical for energizing performance vigor

Neural Selectivity and Representation of Gloss in the Monkey Inferior Temporal Cortex

Supplemental Information. A Visual-Cue-Dependent Memory Circuit. for Place Navigation

Representation of negative motivational value in the primate

Transcription:

Supplementary Information for: Double dissociation of value computations in orbitofrontal and anterior cingulate neurons Steven W. Kennerley, Timothy E. J. Behrens & Jonathan D. Wallis Content list: Supplementary Figure Supplementary Figure 2 Supplementary Results - reinforcement history analysis Supplementary Figure 3 Supplementary Results - behavioral adaptation analysis Supplementary Figure 4 Nature Neuroscience: doi:.38/nn.2

a b L R 2 c d Dorso-ventral position (mm) Dorso-ventral position (mm) 35 45 55 2 2 2 All three +PE All three and +PE ASd ASv LOS 4 MOS 6 25 3 35 35 45 55 2 2 CSd CSv PSd PSv 25 3 35 25 3 35 Anterior-posterior position (mm) 6 2 N- value N and N- value N- reward ASd ASv 2 LOS 4 MOS 6 25 3 35 CSd CSv PSd PSv 25 3 35 25 3 35 Anterior-posterior position (mm) 6 2 3 4 2 2 3 4 3 4 2 2 3 4 LOS +32 mm All three MOS CSd CSv PSd PSv 3 35 4 LOS ACC +PE 3 35 4. cm LPFC OFC All three and +PE 3 35 4 Anterior-posterior position (mm) MOS CSd CSv PSd PSv 3 35 4 3 35 4 Anterior-posterior position (mm) 6 2 N- value N and N- value N- reward 6 2 3 35 4 Supplementary Figure. PFC recording locations. (a) Schematic showing the frontal boundaries on ventral, medial and lateral views of the macaque brain. Dashed line depicts unfolded cingulate sulcus. (b) Magnetic resonance scan of a coronal slice through the frontal lobe of subject B. Shaded regions denote the boundaries of the three frontal areas investigated. White lines depict potential electrode paths. The coronal slice is at the approximate center position of the recording chambers along the anterior-posterior axis. (c) Locations of all recorded neurons (open circles) and neurons selective (filled circles) for encoding all three decision variables (All three), positive prediction error (+PE) or both (All three and +PE) in subject A (left) and subject B (right). (d) Same as in (c) but neurons selective for previous trial overall value (N ), previous and current trial overall value (N and N ) and previous trial reward value. The top, middle and bottom panels correspond to areas ACC, LPFC and OFC, respectively. We measured the anterior-posterior position from the interaural line (xaxis), and the dorso-ventral position relative to the lip of the ventral bank of the principal sulcus ( point on y-axis). Gray shading indicates unfolded sulci. CSd/CSv = dorsal/ventral bank of the anterior cingulate sulcus, ASd/ASv = dorsal/ventral spur of arcuate sulcus, PSd/PSv = dorsal/ventral bank of principal sulcus, LOS/MOS = lateral/medial orbital sulcus. Nature Neuroscience: doi:.38/nn.2

3 a Probability coding during choice.3.5.7.9 Firing rate (Hz) 2 Regreession coefficient Regreession coefficient Firing rate (Hz) b Regreession coefficient Firing rate (Hz) c Probability coding during reward.7.5.3. 2 Value Value 2 4 4 2 2 2 4 4 2 2 Time from picture onset (s).3.5.7.9 2 Probability coding during no reward Value Time from outcome onset (s) Supplementary Figure 2. Three single neurons showing activity on probability trials during both the choice and outcome epochs. Conventions as in Figure 2. (a) An LPFC neuron that encoded reward probability with a negative regression coefficient during the choice epoch which was maintained during the outcome epoch, reflecting a value signal at both choice and outcome. (b) An ACC neuron that encoded reward probability at choice with a negative regression coefficient, and strongly encoded reward probability on both rewarded and unrewarded outcomes also with a negative regression coefficient, thus reflecting a value signal at both choice and outcome. (c) An ACC neuron that encoded a salience signal. This neuron encoded reward probability during both rewarded and unrewarded outcomes but with opposite signed regression coefficients. In other words, this neuron increased firing rate as the absolute value of the prediction error increased, suggesting it encoded the discrepancy between expected and experienced outcome but in an unsigned manner. Nature Neuroscience: doi:.38/nn.2

4 Reinforcement history analysis Overall value history analysis We examined whether our population of PFC neurons encoded the overall value history of choices as well as the reward amount received on the last trial (GLM-3; see Methods). This analysis was performed across all trials with a focus on three different task epochs; during the -ms fixation period prior to picture onset, during the 5- ms choice epoch, and during the -ms outcome epoch of the current trial. The logic here was that we wanted to investigate when the value information about the previous trial first arose (i.e., did it carry over from the previous trial) and when it decayed. We first determined the number of neurons in each PFC area which encoded the past trial overall value during these three different epochs (Supplementary Figure 3a c). At the time of fixation prior to the current choice, there was no difference between the number of neurons across PFC areas which encoded N overall value (Supplementary Fig. 3a; χ 2 =3.3, P>.). Yet as soon as the pictures were presented and the current choice could be evaluated, significantly more neurons in OFC encoded N overall value (Supplementary Fig. 3b; χ 2 =6.2, P=.4). OFC neurons continued to significantly encode N overall value into the outcome epoch of the current choice, but only when compared to LPFC neurons (Supplementary Fig. 3c; χ 2 =8.7, P=.). Thus, OFC neurons do not simply encode the previous trial value as a working memory signal that persists from the time of the previous outcome. Instead, the value history representation only emerges at the time of the current choice, as if this value representation is used as a reference for which to base the encoding of the current choice value. In the value history analysis in the main text, we found that the activity of all OFC neurons exhibited a strong correlation between the current choice overall value and the previous history of choice values (Fig. 8g h). In other words, as a population, OFC neurons simultaneously encode information about the current and past overall choice value, emerging around 3-ms into the current choice evaluation. This correlation was negative, indicating that the firing rate of the current choice increased as the previous trial value decreased (i.e., as the discrepancy between the value of the current and past choice values increased). Most of our analyses require neurons to reach a specific level of significance before they are classified as encoding a specific value parameter (see Methods). This ensures that our Type error levels were appropriate. However, it is possible that this selectivity criterion also excludes some neurons that might be encoding a particular value parameter but that do not quite reach our level of significance (Type II error). This is a particularly important issue for the reinforcement history analysis for two reasons. First, the correlation analyses shown in Figure 8g h requires neurons to reach significance for two independent value representations (i.e., current and past trial value), yet we did not relax our selectivity criterion for reaching this classification. Second, despite relatively few neurons reaching significance for both of these value parameters (Fig. 8f), the correlation between these two value parameters within the entire OFC population was very high. This suggested that this correlation might be at least partially influenced by neurons that did not quite reach our selectivity criterion for one of the two value parameters. Nature Neuroscience: doi:.38/nn.2

5 To examine this issue directly, we excluded all of the selective PFC neurons that encoded both the current and past trial overall value (Fig. 8f) and performed this correlation again. This excluded a total of 2 (8%), 23 (6%) and 9 (9%) of LPFC, OFC and ACC neurons, respectively. While this exclusion procedure removed any correlation between current and past overall value coding in the firing rates of LPFC and ACC neurons, the neuronal activity within the remaining OFC population exhibited a very strong anti-correlation between current overall value and the overall choice value from one (Supplementary Fig. 3d) and two (Supplementary Fig. 3e) trials into the past. This implies that as a population, OFC neurons encode an adaptive choice value signal that is dynamically adapted based on the recent history of choice values, whereas neurons in both ACC and LPFC encode current choice value in a way that is insensitive to past choice values. Reward history analysis We next determined the number of neurons in each PFC area which encoded the past trial reward value (magnitude) at the time of the current trial's fixation, choice and outcome epochs, respectively (Supplementary Fig. 3f h). At the time of fixation prior to the current choice, many neurons in ACC encoded the amount of reward received on the previous trial (Supplementary Fig. 3f; χ 2 =33, P<9x 8 ). Although this signal could reflect information about the magnitude of liquid reward still present in the mouth, it is noteworthy that the fixation epoch occurs after a minimum of 2.5s (ITI and trial termination delays) from the termination of reward delivery, suggesting that this signal may instead reflect a reward working memory signal. There were also significantly more ACC neurons which encoded the reward value of the previous trial at the time of the current choice (Supplementary Fig. 3g; χ 2 =6., P=.5), and there was a significant tendency for these ACC neurons to have a RC (34/52; z-score=2.4, P=.8 binomial test). No significant trial N reward effects were present in the outcome epoch of the current trial (Supplementary Fig. 3h). Nature Neuroscience: doi:.38/nn.2

6 a b c Percentage selectivity + 2 2 Trial N overall value: Trial N fixation epoch LPFC OFC ACC ** 2 2 Trial N overall value: Trial N choice epoch ** * * 2 2 Trial N overall value: Trial N outcome epoch * d Trial N vs N overall value e Trial N vs N 2 overall value.2.2 Correlation (r): N vs. N value.2.4 LPFC (N=236) OFC (N=7) ACC (N=) LPFC <> ACC OFC <> LPFC OFC <> ACC.6.5. Time from picture onset (s) Correlation (r): N vs. N 2 value.2.4.6.5. Time from picture onset (s) f Percentage selectivity + 2 2 Trial N reward value: Trial N fixation epoch * * LPFC OFC ACC g 2 2 Trial N reward value: Trial N choice epoch * ** h 2 2 Trial N reward value: Trial N outcome epoch Supplementary Figure 3. Neuronal encoding of value history and reward history. (a c) Prevalence of neurons encoding the overall value of the previous (N ) trial at the time of fixation (a), choice (b), or outcome (c) of the current trial (N) with a positive (+) or negative ( ) regression coefficient. Conventions as in Figure 3. OFC neurons were significantly more likely to encode the overall value of the past trial compared to ACC and LPFC, but only at the time the current choice was assessed. Thus, this value signal emerged only once the current choice could be evaluated. (d e) Mean correlation between encoding of the overall value of the current and N (d) or N 2 (e) trial but when all of the significant neurons from Figure 8f are removed. Conventions as in Figure 8. The anti-correlation between present and past overall value was evident even in the non-significant population of OFC neurons. (f h) Prevalence of neurons encoding the reward value of the previous (N ) trial at the time of fixation (f), choice (g), or outcome (h) of the current trial with a positive or negative regression coefficient. ACC neurons were significantly more likely to encode the reward amount received on the past trial at the time of fixation or choice of the current trial. Thus, ACC neurons appeared to maintain a representation of the previous reward history in working memory that carried into the next trial. Nature Neuroscience: doi:.38/nn.2

7 Behavioral adaptation analysis To examine adaptive behavior following errors or unrewarded responses, we collapsed across the side presentation of stimuli such that we examined choice behavior for three decision variables, four choice values and two picture sets (see Fig. b). We calculated the percent correct per condition across the 24 and 35 behavioral sessions in which neuronal data was collected in subjects A and B, respectively (Supplementary Fig. 4a). Across all conditions, the two subjects performed a total of 32,3 trials with only 626 errors (.% correct). Subjects had error rates of <3% on 9/24 conditions, with 77/626 errors (28% of all errors) occurring on the two conditions where the.7 stimulus from picture set 2 was presented. Apart from these two conditions, subjects made one or fewer errors per condition in >8% of all behavioral sessions. When subjects made at least one error within a condition, we determined the likelihood of making a correct choice the next time the same condition was presented within the behavioral session (i.e., error correction). Across all conditions and sessions, subject A corrected 25/253 (85%; z-score=., P<x 2 binomial test) and subject B corrected 24/28 (86%; z- score=2., P<x 2 binomial test) of their errors. Thus, both subjects rarely made errors and when they did, they were significantly above chance at correcting their error on the next occurrence of the same condition. The lack of error trials across conditions within a behavioral session yielded insufficient power for any neuronal analysis of errorrelated selectivity. To explore whether subjects adapted their future choices based on whether current choices were rewarded, we sorted all probability trials within a behavioral session by condition and examined successive pairs of trials of the same condition (trials N and N+). For example, the. vs..3 probability condition might occur on trial 9, 28, 8, etc., and our analysis would assign trial 9 as trial N and examine whether trial 28 ( trial N+ ) was correct, and then repeat the same procedure for trials 28 and 8 and so forth until all trials within a condition were examined. The logic for this analysis was that if subjects were basing their choices on the history of probabilistic outcomes (hence adaptive decision-making) rather than the well-learned associations between stimuli and outcomes then they should be less likely to select correctly on trial N+ if trial N was a correct choice that was not rewarded. We determined performance on probability trial N+ when each subject chose correctly on probability trial N (Supplementary Fig. 4b). This analysis revealed that subjects were relatively indifferent to the outcome (i.e., reward or no-reward) of a correct choice on trial N; regardless of whether or not a correct choice was rewarded on trial N, subjects chose correctly on over 9% of choices when the same condition was next presented (trial N+). For example, despite the fact that subjects received no reward on ~7% of trials following a choice of the.3 probability stimulus in the. vs..3 condition (value=, red bars), subjects only changed their behavior (and hence selected the incorrect. probability stimulus) on trial N+ <2% of the time. Thus we found no evidence that subjects were more likely to adapt their behavior on trial N+ (i.e., select the less optimal stimulus) if a correct trial N choice was not rewarded. Collectively, these results suggest that the choices of both subjects were guided primarily by well learned predictions between a stimulus and its associated value rather than being influenced by the session history of choice outcomes. Nature Neuroscience: doi:.38/nn.2

8 a Percentage correct Subject A Subject B Both subjects 9 9 9 Probability Payoff Effort Probability Payoff Effort Probability Payoff Effort b Percentage correct on trial N+ following correct trial N 9 86 rewarded 9 86 Picture set choice value 2 3 4 Picture set 2 choice value Subject A Subject B Both Subjects not rewarded rewarded 2 34 not rewarded 9 86 rewarded not rewarded Supplementary Figure 4. Percentage correct choices per decision variable or reward history. (a) Percentage correct per choice condition defined as the percentage of completed trials where the more valuable option was chosen. Choice value ranges from 4 (see Fig. b), where a value of is the lowest value for any decision variable (e.g.,. vs..3 probability choice) and a value of 4 is the most valuable for any decision variable (e.g.,.7 vs..9 probability choice). Filled and unfilled bars indicate performance on choices from picture sets and 2, respectively. (b) Future performance across the different probability conditions as a function of whether past choices were rewarded. For each probability condition when a correct response was made on trial N, the plot shows the percentage correct on the trial when the same condition was next presented (trial N+) as a function of choice outcome (reward versus no-reward) on trial N. Whether or not a subject was rewarded for a correct response on trial N had virtually no influence on whether they would choose correctly on the same condition when next presented (trial N+). Nature Neuroscience: doi:.38/nn.2

9 References 5. Lara, A. H., Kennerley, S. W. & Wallis, J. D. Encoding of gustatory working memory by orbitofrontal neurons. J Neurosci 29, 765-74 (29). Nature Neuroscience: doi:.38/nn.2