Supplementary Methods and Results

Similar documents
Supplementary Information

Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis

Supporting Online Material for

SUPPLEMENTARY METHODS. Subjects and Confederates. We investigated a total of 32 healthy adult volunteers, 16

WHAT DOES THE BRAIN TELL US ABOUT TRUST AND DISTRUST? EVIDENCE FROM A FUNCTIONAL NEUROIMAGING STUDY 1

Resistance to forgetting associated with hippocampus-mediated. reactivation during new learning

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis

Supplementary information Detailed Materials and Methods

There are, in total, four free parameters. The learning rate a controls how sharply the model

Functional MRI Mapping Cognition

Twelve right-handed subjects between the ages of 22 and 30 were recruited from the

Supporting online material. Materials and Methods. We scanned participants in two groups of 12 each. Group 1 was composed largely of

Supplemental Data. Inclusion/exclusion criteria for major depressive disorder group and healthy control group

Experimental Design. Outline. Outline. A very simple experiment. Activation for movement versus rest

smokers) aged 37.3 ± 7.4 yrs (mean ± sd) and a group of twelve, age matched, healthy

Distinguishing informational from value-related encoding of rewarding and punishing outcomes in the human brain

HHS Public Access Author manuscript Nat Neurosci. Author manuscript; available in PMC 2015 November 01.

HHS Public Access Author manuscript Eur J Neurosci. Author manuscript; available in PMC 2017 August 10.

Sum of Neurally Distinct Stimulus- and Task-Related Components.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Task timeline for Solo and Info trials.

Comparing event-related and epoch analysis in blocked design fmri

Supplementary Online Content

Experimental design of fmri studies

Supplemental Information. Differential Representations. of Prior and Likelihood Uncertainty. in the Human Brain. Current Biology, Volume 22

Double dissociation of value computations in orbitofrontal and anterior cingulate neurons

Methods to examine brain activity associated with emotional states and traits

Prediction of Successful Memory Encoding from fmri Data

Supplemental Information. Triangulating the Neural, Psychological, and Economic Bases of Guilt Aversion

Role of the ventral striatum in developing anorexia nervosa

Supplementary Materials for

QUANTIFYING CEREBRAL CONTRIBUTIONS TO PAIN 1

Neural correlates of retrieval processing in the prefrontal cortex during recognition and exclusion tasks

Quantifying the attenuation of the ketamine phmri response in humans: a validation using antipsychotic and glutamatergic agents.

Experimental design. Experimental design. Experimental design. Guido van Wingen Department of Psychiatry Academic Medical Center

Reporting Checklist for Nature Neuroscience

Contributions of the Amygdala to Reward Expectancy and Choice Signals in Human Prefrontal Cortex

Procedia - Social and Behavioral Sciences 159 ( 2014 ) WCPCG 2014

Supplementary Online Material Supplementary Table S1 to S5 Supplementary Figure S1 to S4

From affective value to decision-making in the prefrontal cortex

Supplementary Information Appendix: Default Mode Contributions to Automated Information Processing

Supplementary Online Content

Supplementary Online Content

A possible mechanism for impaired joint attention in autism

Functional topography of a distributed neural system for spatial and nonspatial information maintenance in working memory

Common Neural Substrates for Ordinal Representation in Short-Term Memory, Numerical and Alphabetical Cognition

Personal Space Regulation by the Human Amygdala. California Institute of Technology

Functional Magnetic Resonance Imaging with Arterial Spin Labeling: Techniques and Potential Clinical and Research Applications

Reporting Checklist for Nature Neuroscience

Experimental design for Cognitive fmri

SUPPLEMENTARY MATERIALS: Appetitive and aversive goal values are encoded in the medial orbitofrontal cortex at the time of decision-making

Supplementary Online Content

doi: /brain/aws024 Brain 2012: 135; Altered brain mechanisms of emotion processing in pre-manifest Huntington s disease

doi: /brain/awq006 Brain 2010: 133; Imaging memory in temporal lobe epilepsy: predicting the effects of temporal lobe resection

Supporting Online Material for

Bayesian Inference. Thomas Nichols. With thanks Lee Harrison

Nature Neuroscience: doi: /nn Supplementary Figure 1. Blame judgment task.

Neuroimaging vs. other methods

Reporting Checklist for Nature Neuroscience

On the nature of Rhythm, Time & Memory. Sundeep Teki Auditory Group Wellcome Trust Centre for Neuroimaging University College London

International Journal of Innovative Research in Advanced Engineering (IJIRAE) Volume 1 Issue 10 (November 2014)

Reporting Checklist for Nature Neuroscience

Functional Elements and Networks in fmri

Hallucinations and conscious access to visual inputs in Parkinson s disease

Supplementary materials for: Executive control processes underlying multi- item working memory

Supplementary Figure 1. Example of an amygdala neuron whose activity reflects value during the visual stimulus interval. This cell responded more

Supplementary Material for

NeuroImage 70 (2013) Contents lists available at SciVerse ScienceDirect. NeuroImage. journal homepage:

Dissociating Valence of Outcome from Behavioral Control in Human Orbital and Ventral Prefrontal Cortices

For better or for worse: neural systems supporting the cognitive down- and up-regulation of negative emotion

Reporting Checklist for Nature Neuroscience

SUPPLEMENTARY INFORMATION

Behavioural Brain Research

Reporting Checklist for Nature Neuroscience

Evaluating the roles of the basal ganglia and the cerebellum in time perception

Supplementary Material. Functional connectivity in multiple cortical networks is associated with performance. across cognitive domains in older adults

Investigations in Resting State Connectivity. Overview

Perceptual Learning of Motion Direction Discrimination with Suppressed and Unsuppressed MT in Humans: An fmri Study

Reporting Checklist for Nature Neuroscience

Supplementary Material for The neural basis of rationalization: Cognitive dissonance reduction during decision-making. Johanna M.

Supplemental Information

Functional connectivity in fmri

Distinct valuation subsystems in the human brain for effort and delay

Mechanisms of Hierarchical Reinforcement Learning in Cortico--Striatal Circuits 2: Evidence from fmri

Testing the Reward Prediction Error Hypothesis with an Axiomatic Model

Estimation of Statistical Power in a Multicentre MRI study.

Reporting Checklist for Nature Neuroscience

SUPPLEMENTARY MATERIAL. Table. Neuroimaging studies on the premonitory urge and sensory function in patients with Tourette syndrome.

The role of cognitive effort in subjective reward devaluation and risky decision-making

Title of file for HTML: Peer Review File Description:

SUPPLEMENTAL MATERIAL

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

HST.583 Functional Magnetic Resonance Imaging: Data Acquisition and Analysis Fall 2008

Table 1. Summary of PET and fmri Methods. What is imaged PET fmri BOLD (T2*) Regional brain activation. Blood flow ( 15 O) Arterial spin tagging (AST)

Reference Dependence In the Brain

HST 583 fmri DATA ANALYSIS AND ACQUISITION

Supplementary Online Content

The Neural Correlates of Moral Decision-Making in Psychopathy

The Role of Working Memory in Visual Selective Attention

Supplementary Methods

Dissociation between Dorsal and Ventral Posterior Parietal Cortical Responses to Incidental Changes in Natural Scenes

Transcription:

Supplementary Methods and Results Subjects and drug conditions The study was approved by the National Hospital for Neurology and Neurosurgery and Institute of Neurology Joint Ethics Committee. Subjects were contacted by email and screened for exclusion criteria : left-handedness, age below 18 or above 39, regular taking of drug or medication, history of psychiatric or neurological disease, contra-indication to MRI scanning (pregnancy, claustrophobia, metallic implants). All included subjects gave informed consent prior to taking part. A total of 39 healthy subjects were studied in total (19-37 years, 23 males), randomly split into 3 groups of 13 subjects. The initial placebo group was studied first to establish that our experimental procedure produces both clear-cut behavioural learning effects and robust striatal activity in association with the prediction errors generated by the computational model. This group underwent the same protocol as the drug groups, believing that they could receive either placebo (lactose), haloperidol or levodopa. However, the experimenters were aware of this treatment, so this part of the experiment was effectively performed under single blind conditions, which places some restriction on including the placebo group in statistical comparisons. Following this, the remaining subjects were treated with either Haldol (haloperidol 1 mg) or Madopar (levodopa 100 mg + benserazide 25 mg), under double-blind conditions. These subjects were informed in the same manner as the placebo group, believing they could receive either placebo, haloperidol or levodopa. We used relatively low dose haloperidol and levodopa to minimise the risk of any side-effects. These doses are comparable to clinically-prescribed introductory doses used to alleviate akineto-rigid symptoms in Parkinson s disease (levodopa), and psychotic symptoms in schizophrenia (haloperidol). Maintenance doses may be 5-10 times higher. After consenting, the subjects took two capsules, Haldol (or placebo) 3 hours and placebo (or Madopar) 1 hour before scanning, to ensure peak of plasma level was reached during the scan acquisition for both drugs 1,2. The subjects were then asked to read the instructions about the task (attached in Supplementary Material), which could be reformulated orally if necessary. They were then trained on a practice version outside the scanner, prior to performing the 3 test sessions within the scanner. Behavioural task and analysis The task was a first order probabilistic instrumental learning task with monetary outcomes. Each of the 3 test sessions was an independent task, containing new stimuli to be learned. Each session lasted 10.5 minutes, contained 90 trials and employed 3 different pairs of abstract visual stimuli, which were letters taken from the Agathodaimon font. Each pair was presented on a visual display monitor, and the subject was required to select between the two stimuli, to try and maximise their financial reward, given to them

at the end of the experiment (see Fig. 1a). Each of the pairs of stimuli (gain, loss, neutral) was associated with pairs of outcomes (gain 1/nil, loss 1/nil, look at 1/nil). For the gain pair, the probabilities of winning 1/nil were 0.8/0.2 for one stimulus and 0.2/0.8 for the other. Similarly in the loss pair, the probabilities of losing 1/nil were 0.8/0.2 for one stimulus and 0.2/0.8 for the other. There was no financial outcome in the neutral pair, as subjects could only look at an image of a 1 coin in one outcome, or look at nothing in the other one. The probabilities of the two outcomes were 0.8/0.2 for one stimulus, and 0.2/0.8 for the other, in a similar manner to the gain and loss trials. On each trial, one pair was randomly presented and the two stimuli were displayed on the screen, above and below a central fixation cross, their relative position being counterbalanced across trials. The subject was required to choose the upper stimulus by pressing a button (Go response), or the lower stimulus by doing nothing (NoGo response). After 4 seconds the choice was circled in red and the outcome (either Nothing, Gain, Loss or Look ) was written on the screen, accompanied by a 1 coin in the event of gain, loss and look outcomes. Therefore, to win money the subjects had to learn, by trial and error, the stimulus-outcome associations. Winnings were rounded up to a fixed amount such that all subjects actually left with the same amount ( 45). To assess for drug side-effects, subjects were asked immediately after the task to rate their subjective feelings using a visual analogue scales 3 (see Supp. Table 2). One-tailed t-tests showed no significant difference between drug groups, neither on the original dimensions of the questionnaire or on the first (two) dimensions derived from a principal component analysis, sorted by amount of explained variance (see Supp. Fig. 1). We additionally performed a Bayesian model comparison 4, to look for how many Gaussians are needed to best fit the ratings. We found that the likelihood of the ratings was superior to 0.99 with one Gaussian (and close to 0 with more Gaussians), suggesting that these data are unlikely to be clustered into groups corresponding to drug conditions. To assess learning, we examined the total amount of money won by the subjects. This is reflected in figure 1, where we show the frequencies with which subjects chose the stimulus predominantly associated with the gain or loss, meaning with p=0.8 of winning 1 in the gain condition, or with p=0.8 of losing 1 in the loss condition. As can be seen in Figure 1b, subjects learnt to choose the advantageous cues through the course of the experiment. To assess the effects of drugs on choice performance, we calculated the total amount earned by each subject and compared this with one-tailed t-tests between levodopa and haloperidol groups, for the gain and loss conditions. Levodopa- vs haloperidol-treated subjects won 66.7±1.0 vs 61.0±2.1, and lost 26.7±1.5 vs 28.9±1.4 respectively. An alternative way to describe the loss condition performance, to illustrate the symmetry with the gain condition, is to consider how much money the subjects avoided losing (that is, 90 minus amount lost): 63.3±3.0 vs 61.1±2.8 respectively.

In t-contrasts, subjects earned more money with levodopa than haloperidol (t 24 =2.24, P=0.017), but did not lose less money (t 24 =1.15, P=0.13). The drug x condition interaction was marginally below a threshold of significance (t 24 =1.79, P=0.060). For completeness, we also report the results of an ANOVA that takes into account the placebo group, although strictly speaking it was not tested in same conditions as the drug groups (see above). We saw a significant effect in the gain condition (F 2,38 =3.36, P=0.041), but not in the loss condition (F 2,38 =0.52, P=0.60). If we include the two conditions (gain and loss) as a crossed factor, we find a significant effect of the drug factor (F 2,77 =4.35, P=0.017), no effect of the condition factor (F 1,77 =0.84, P=0.37) and no interaction between drug and condition ((F 2,77 =2.34, P=0.10). Other aspects of behaviour were considered, and detailed in Supp Table 1. Computational model We then fit a standard reinforcement learning algorithm to each subject s sequence of choices 5. We used a basic Q learning algorithm, which has been shown previously to offer a good account of instrumental choice in both humans and primates 6,7. For each pair of stimuli A and B, the model estimates the expected values of choosing A (Qa) and choosing B (Qb), on the basis of individual sequences of choices and outcomes. This value, termed a Q value, is essentially the expected reward obtained by taking that particular action. These Q values were set at zero before learning, and after every trial t>0 the value of the chosen stimulus (say A) was updated according to the rule Qa(t+1)=Qa(t)+α*δ(t). The prediction error was δ(t)=r(t)-qa(t), with R(t) defined as the reinforcement obtained as an outcome of choosing A at trial t. In other words, the prediction error δ(t) is the difference between the expected outcome (i.e. Q(t)) and the actual outcome (i.e. R(t)). The reinforcement magnitude R was +1 and -1 for winning and loosing 1, and 0 for nothing outcomes. Given the Q values, the associated probability of selecting each action was estimated by implementing the softmax rule, e.g. for choosing A Pa(t)=exp(Qa(t)/β)/(exp(Qa(t)/β)+exp(Qb(t)/β)). This is a standard stochastic decision rule that calculates the probability of taking one of a set of actions according to their associated values 8. The constants α (learning rate) and β (temperature) were adjusted to maximise the probability (or likelihood) of the actual choices under the model. For the gain / loss conditions respectively, we found α = 0.29 / 0.46 and β = 0.18 / 0.33, with 95% confidence intervals of 0.24-0.31 / 0.40-0.52 and 0.17-0.20 / 0.31-0.35. To compare the accuracy of fit between drugs and conditions, we used negative log likelihood, which can be summed across trials, sessions and subjects. The learning model was fitted with a single set of parameters across all subjects in all groups, since for our imaging analysis we test the null hypothesis that there is no difference between groups. It was then used to create a statistical regressor corresponding to the modelled outcome prediction error in the imaging data. Note that we do not model temporal difference errors.

Images acquisition and analysis T2*-weighted echo planar images (EPI) were acquired with blood oxygen dependant level (BOLD) contrast on a 3.0 Tesla Siemens Allegra magnetic resonance scanner. We employed a tilted plane acquisition sequence 9 designed to optimize functional sensitivity in the orbitofrontal cortex and medial temporal lobes. To cover the whole brain with a short TR (1.95s), we used the following parameters: 30 slices; 2mm slice thickness; 2mm inter-slice gap. T1-weighted structural images were also acquired, coregistered with mean EPI images and averaged across subjects to allow group level anatomical localization. EPI images were analysed in an event-related manner, using the statistical parametric mapping software SPM5 (Wellcome Department of Imaging Neuroscience, London, UK). Pre-processing consisted of spatial realignment, normalization to a standard EPI template, and spatial smoothing using a Gaussian kernel with a full-width at half-maximum of 6mm. To correct for motion artefact, subjectspecific realignment parameters were modelled as covariates of no interest. We used a single statistical linear regression model for all our analyses, as follows. Each trial was modelled as having 2 time points, corresponding to stimuli and outcome display. Separate regressors were created for the 6 stimuli conditions (3 pairs times 2 positions) and the 6 types of outcomes (3 pairs times 2 outcomes). Prediction errors generated by the Q learning model were then used as parametric modulation of additional regressors modelled at outcome time points, separately for the gain and loss trials. All regressors of interest were convolved with a canonical haemodynamic response function (HRF). Linear contrasts of regression coefficients were computed at the individual subject level and then taken to a group level random effects analysis of variance. In our first analysis, we looked at the representation of the prediction error, across all groups (placebo, haloperidol and levodopa). We observed a similar pattern of activity in striatum (our main area of interest) in both gain and loss trials. The data presented in figure 2a (left panels) correspond to a true conjunction analysis 10 across gain and loss conditions, which identifies areas significantly active in both gain and loss trials. The maxima shown had the following MNI coordinates / statistical t-scores : left posterior putamen [-30-10 -2] / t 38 =9.43 (far left panel) and left ventral striatum [-10 12-8] / t 38 =10.10 (middle panel). We also noted significant activity in the right anterior insula ([40 24-8] / t 38 =8.76), negatively correlated with the (appetitive) prediction error, restricted to the loss condition. This represents an aversive prediction error, and is displayed in the rightward panel of figure 2a. For the analysis of activity occurring at the time of stimulus (Fig. 2b), we performed simple t-contrasts between the associated regressors. Thus, for the Go / NoGo contrast, we subtracted activity correlated with the 2 NoGo (NoGo-gain and NoGo-loss) regressors from the 2 Go (Go-gain and Go-loss) regressors. For the analysis of gain versus neutral, we subtracted the 2 neutral conditions (Go-neutral and

NoGo-neutral) from the 2 gain conditions (Go-gain and NoGo-gain). Similar contrast was performed for the analysis of loss versus neutral. This second analysis showed that the same brain areas that reflect outcome prediction errors are also able to discriminate between conditions at the time of stimuli display (compare Fig 2a and Fig. 2b). The left posterior putamen was activated in the Go / NoGo contrast ([-30-6 0] / t 38 =6.63), the left ventral striatum in the gain / neutral contrast ([-12 10-10] / t 38 =7.72) and the right anterior insula in the loss / neutral contrast ([30 28-6] / t 38 =8.57). There was no interaction between the gain/loss trials and Go/NoGo trials. To isolate brain areas related to prediction errors we used very conservative thresholds. All group level SPMs are shown with a threshold of P<0.05 after family-wise error correction for the entire brain. For further comparison between drug treatments, the localisation selectivity was enhanced by using a spatial threshold of at least 64 voxels in a cluster 11. With this extended threshold, all significant clusters were located either in the striatum or the anterior insula, except for the Go / NoGo contrast (see Supp. Table 3). To look specifically at the effects of the drugs, we compared the time courses of brain responses reflecting prediction errors, between haloperidol and levodopa groups. To avoid making any assumption on how the drugs affect the haemodynamic response, the time courses were estimated by fitting a flexible basis set of finite impulse responses (FIRs), separated from the next by one scan (1.95s). The time courses we report in Figure 3 were averaged across trials, sessions and subjects. We report an increased magnitude of both positive and negative responses with levodopa compared to haloperidol, apparent in the striatum for the gain condition alone. We then calculated the area between positive and negative time courses, within the duration of the response (3-9 seconds post outcome). This measure was significantly different between drug groups in the gain condition (t 24 =1.98, P=0.029) but not in the loss condition (t 24 =0.20, P=0.59), with a significant drug x condition interaction (t 24 =1.64, P=0.041). As prediction errors increase linearly with the reinforcement magnitude R, we used the area under time courses to estimate the effective parameter R for levodopa and haloperidol groups, by comparison with the placebo group where R was set at 1. With these adjusted values of reward magnitudes, and with all other parameters held constant, the fit of our original Q-learning model was significantly improved, as shown by t-test of the log likelihoods (t 24 =2.43, P=0.012). To test whether the reward magnitudes estimated from the striatal BOLD responses were close to the optimal values, we calculated the likelihood of the observed behavioural choices for a large range of reward magnitudes, and plotted them on Supp Figure 2. We found for the gain condition that the 95% confidence interval of the maximal likelihood was 1.13-1.33 in the levodopa and 0.67-0.82 in the haloperidol group, which comprise our estimates of effective reward magnitude (1.29 and 0.71 respectively).

Supplementary References 1. Crevoisier,C., Hoevels,B., Zurcher,G. & Da Prada,M. Bioavailability of L-dopa after Madopar HBS administration in healthy volunteers. Eur. Neurol. 27 Suppl 1, 36-46 (1987). 2. McClelland,G.R., Cooper,S.M. & Pilgrim,A.J. A comparison of the central nervous system effects of haloperidol, chlorpromazine and sulpiride in normal volunteers. Br. J. Clin. Pharmacol. 30, 795-803 (1990). 3. Bentley,P., Husain,M. & Dolan,R.J. Effects of cholinergic enhancement on visual stimulation, spatial attention, and spatial working memory. Neuron 41, 969-982 (2004). 4. Noppeney,U., Penny,W.D., Price,C.J., Flandin,G. & Friston,K.J. Identification of degenerate neuronal systems based on intersubject variability. Neuroimage. 30, 885-890 (2006). 5. Sutton,R.S. & Barto,A.G. Reinforcement learning. The MIT Press, Cambridge, MA, (1998). 6. O'Doherty,J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452-454 (2004). 7. Samejima,K., Ueda,Y., Doya,K. & Kimura,M. Representation of action-specific reward values in the striatum. Science 310, 1337-1340 (2005). 8. Luce,R.D. Response times. Oxford University Press, New York (2003). 9. Deichmann,R., Gottfried,J.A., Hutton,C. & Turner,R. Optimized EPI for fmri studies of the orbitofrontal cortex. Neuroimage. 19, 430-441 (2003). 10. Friston,K.J., Penny,W.D. & Glaser,D.E. Conjunction revisited. Neuroimage. 25, 661-667 (2005). 11. Poline,J.B., Worsley,K.J., Evans,A.C. & Friston,K.J. Combining spatial extent and peak intensity to test for activations in functional imaging. Neuroimage. 5, 83-96 (1997).

Supplementary note : instructions given to subjects about the behavioural task. The game is divided into 4 sessions, each comprising 90 trials and lasting 10 minutes. The first session will be used as a practice session before scanning. The three other sessions will be performed in the scanner. The word ready will be displayed 10 seconds before the game starts. In each trial you have to choose between the two symbols displayed on the screen, above and below the central cross. Your choice will be circled in red after a 4 seconds delay. - to choose the lower symbol, do nothing - to choose the upper symbol, press the button and hold it. If you change your mind you can still release or press the button until the delay has elapsed. As an outcome of your choice you may - get nothing - gain a 1 coin - lose a 1 coin - look at a 1 coin (but get nothing). The two symbols displayed on a same screen are not equivalent in terms of outcome: with one you are more likely to get nothing than the other. Each symbol has its own meaning, regardless of where and when it is displayed. The aim of the game is to win as much money as possible. Except for the practice session, you will be allowed to keep the money you have won.

Supplementary Table 1 Effects of drugs on behavioural measures Condition Placebo Levodopa Haloperidol Response time (ms) gain 1036 ± 41 1035 ± 70 1093 ± 32 loss 1503 ± 59 1383 ± 83 1541 ± 46 Go response (%) gain 49.5 ± 1.1 49.7 ± 0.9 51.3 ± 1.5 loss 49.8 ± 1.3 49.7 ± 1.5 47.7 ± 1.6 Correct response (%) gain* 85.0 ± 3.2 91.7 ± 1.4 78.9 ± 5.4 loss 83.1 ± 2.6 86.7 ± 2.1 80.6 ± 2.8 Consistency (%) gain 82.7 ± 1.0 81.9 ± 1.3 80.0 ± 2.8 loss 67.3 ± 3.3 72.9 ± 2.7 66.7± 2.9 Response times were taken between stimuli onset and button press, for the Go responses only. Percentage of Go responses is the number of trials with button press per 100 trials. Correct response corresponds to the stimulus with either high probability of gain or low probability of loss, depending on the condition. For comparison, percentage of correct responses was on average 59.2% in the neutral condition, with no significant difference between groups. Consistency is the percentage of choices identical to the preceding one. Data are expressed as mean ± SEM inter-subjects. * Condition showing significant difference between levodopa and haloperidol groups (one-tailed t-test, P<0.05).

Supplementary Table 2 Effects of drugs on subjective ratings Condition Placebo Levodopa Haloperidol Alert / Drowsy 57.6 ± 6.6 53.2 ± 6.0 56.4 ± 7.6 Calm / Excited 56.4 ± 6.4 72.3 ± 4.6 62.4 ± 6.1 Strong / Feeble 55.9 ± 6.3 51.9 ± 5.6 52.7 ± 6.4 Clear-Headed / Muzzy 54.1 ± 6.7 44.6 ± 5.6 46.7 ± 7.4 Well-coordinated / Clumsy 58.5 ± 5.9 54.8 ± 6.1 56.5 ± 5.7 Energetic / Lethargic 52.4 ± 6.0 35.7 ± 6.1 49.0 ± 7.0 Contented / Discontented 71.3 ± 5.8 78.4 ± 3.2 69.3 ± 4.8 Tranquil / Troubled 65.0 ± 6.1 72.6 ± 3.3 63.5 ± 3.2 Quick-witted / Mentally slow 57.8 ± 5.5 55.5 ± 5.6 51.0 ± 4.7 Relaxed / Tense 59.0 ± 5.5 71.2 ± 3.4 61.3 ± 5.0 Attentive / Dreamy 56.6 ± 6.4 50.4 ± 6.0 48.1 ± 7.3 Proficient / Incompetent 66.1 ± 4.6 61.4 ± 4.3 59.7 ± 5.4 Happy / Sad 70.7 ± 5.4 77.9 ± 3.1 70.6 ± 3.9 Amicable / Antagonistic 76.4 ± 5.0 68.6 ± 3.7 68.0 ± 4.2 Interested / Bored 76.4 ± 5.4 74.8 ± 4.1 65.8 ± 3.8 Gregarious / Withdrawn 63.4 ± 5.0 57.5 ± 4.3 53.5 ± 3.8 At the end of the experiment, subjects had to mark their feelings across lines, each representing the full range of a given dimension, for example from alert to drowsy. Measures are expressed in percentage ± inter-subjects SEM. No significant difference was found between haloperidol and levodopa groups.

Supplementary Table 3 Additional significant clusters observed in the Go / NoGo contrast Laterality X Y Z t-scores Cerebellum R 26-48 30 13.44 Thalamus L -12-22 6 11.77 Lateral motor cortex L -38-26 56 11.24 Medial motor cortex L -6-10 56 8.82 Mid insula L -42-4 14 8.36 Mid insula R 44 0 10 7.44 Only clusters with more than 64 voxels are reported. [x y z] are MNI coordinates ; t-scores were calculated from simple contrast of regression coefficients over all subjects (n=39).

0.5 0.4 0.3 Second principal component 0.2 0.1 0-0.1-0.2-0.3-0.4-0.5-0.2-0.19-0.18-0.17-0.16-0.15-0.14-0.13-0.12-0.11-0.1 First principal component Supplementary Figure 1 Principal component analysis of subjective ratings. The circles represent individual subjects. The colors correspond to the three groups : grey for placebo, green for levodopa and red for haloperidol. All 16 dimensions of subjective ratings were included in the analysis of variance. All 39 subjects were then plotted in the space of the two first components, which explain 87.7 and 6.2 % of the variance, respectively.

Gain condition 0.10 0.08 Likelihood 0.06 0.04 0.02 0.00 0.5 0.75 1 1.25 1.5 Loss condition 0.10 0.08 Likelihood 0.06 0.04 0.02 0.00 0.5 0.75 1 1.25 1.5 Reward magnitude ( ) Supplementary Figure 2 Likelihood of the observed choices as a function of reward magnitude. The colours correspond to the two drug groups: green for levodopa and red for haloperidol. The likelihoods were computed with the action-value learning model across reward magnitudes, and then normalised. The parameters alpha and beta are kept constant, having been optimised previously across all subjects, with a reward value kept to 1. Maximal likelihoods were obtained with reward values of 0.75 / 1.23 in the gain condition and 0.98 / 1.03 in the loss condition, respectively for haloperidol / levodopa group. Vertical bars represent the reward magnitudes estimated from striatal BOLD responses: 0.71 / 1.29 for the gain condition and 1.03 / 0.95 for the loss condition, respectively for haloperidol / levodopa group. Rectangles illustrate SEM of these estimates.