There are, in total, four free parameters. The learning rate a controls how sharply the model

Similar documents
Supplementary Information

Resistance to forgetting associated with hippocampus-mediated. reactivation during new learning

Nature Neuroscience: doi: /nn Supplementary Figure 1. Task timeline for Solo and Info trials.

Distinct Value Signals in Anterior and Posterior Ventromedial Prefrontal Cortex

Supporting online material for: Predicting Persuasion-Induced Behavior Change from the Brain

Supplementary Material. Functional connectivity in multiple cortical networks is associated with performance. across cognitive domains in older adults

Distinct valuation subsystems in the human brain for effort and delay

Supplemental Information. Triangulating the Neural, Psychological, and Economic Bases of Guilt Aversion

Is model fitting necessary for model-based fmri?

Supporting Online Material for

Supplementary Online Material Supplementary Table S1 to S5 Supplementary Figure S1 to S4

HHS Public Access Author manuscript Nat Neurosci. Author manuscript; available in PMC 2015 November 01.

Title of file for HTML: Supplementary Information Description: Supplementary Figures, Supplementary Tables and Supplementary References

Neurobiological Foundations of Reward and Risk

Neural activity to positive expressions predicts daily experience of schizophrenia-spectrum symptoms in adults with high social anhedonia

Text to brain: predicting the spatial distribution of neuroimaging observations from text reports (submitted to MICCAI 2018)

Computational approaches for understanding the human brain.

Cortical and Hippocampal Correlates of Deliberation during Model-Based Decisions for Rewards in Humans

Theory of mind skills are related to gray matter volume in the ventromedial prefrontal cortex in schizophrenia

Supplemental Data. Inclusion/exclusion criteria for major depressive disorder group and healthy control group

Generalization of value in reinforcement learning by humans

Hippocampal brain-network coordination during volitionally controlled exploratory behavior enhances learning

SUPPLEMENTAL INFORMATION

Supplementary Results: Age Differences in Participants Matched on Performance

Supplementary Material S3 Further Seed Regions

Nature Neuroscience: doi: /nn Supplementary Figure 1. Blame judgment task.

SUPPLEMENTARY MATERIALS: Appetitive and aversive goal values are encoded in the medial orbitofrontal cortex at the time of decision-making

Activity in Inferior Parietal and Medial Prefrontal Cortex Signals the Accumulation of Evidence in a Probability Learning Task

correlates with social context behavioral adaptation.

Role of the ventral striatum in developing anorexia nervosa

Supplementary Materials for

Overt vs. Covert Responding. Prior to conduct of the fmri experiment, a separate

Title of file for HTML: Peer Review File Description:

Supplementary Methods

Prefrontal connections express individual differences in intrinsic resistance to trading off honesty values against economic benefits

Supplemental Information. Differential Representations. of Prior and Likelihood Uncertainty. in the Human Brain. Current Biology, Volume 22

How Self-Determined Choice Facilitates Performance: A Key Role of the Ventromedial Prefrontal Cortex

Dissociating hippocampal and striatal contributions to sequential prediction learning

Supplementary Methods and Results

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Lateral Intraparietal Cortex and Reinforcement Learning during a Mixed-Strategy Game

doi: /brain/awr059 Brain 2011: 134; Expected value and prediction error abnormalities in depression and schizophrenia

Double dissociation of value computations in orbitofrontal and anterior cingulate neurons

Human Dorsal Striatum Encodes Prediction Errors during Observational Learning of Instrumental Actions

On optimal decision-making in brains and social insect colonies Marshall, Bogacz, Dornhaus, Planque, Kovacs,Franks

QUANTIFYING CEREBRAL CONTRIBUTIONS TO PAIN 1

Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson s Patients in a Dynamic Foraging Task

Simple Linear Regression the model, estimation and testing

Supplementary Figure 1

Dissociating the Role of the Orbitofrontal Cortex and the Striatum in the Computation of Goal Values and Prediction Errors

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain

Prospection, Perseverance, and Insight in Sequential Behavior

Supplemental information online for

Social and monetary reward learning engage overlapping neural substrates

Model-Based fmri Analysis. Will Alexander Dept. of Experimental Psychology Ghent University

The association of children s mathematic abilities with both adults cognitive abilities

1.4 - Linear Regression and MS Excel

Supplementary Online Content

The Neural Basis of Following Advice

Intelligence moderates reinforcement learning: a mini-review of the neural evidence

A Model of Dopamine and Uncertainty Using Temporal Difference

Investigating directed influences between activated brain areas in a motor-response task using fmri

The Inherent Reward of Choice. Lauren A. Leotti & Mauricio R. Delgado. Supplementary Methods

Neuro-cognitive systems underpinning antisocial behavior and the impact of maltreatment and substance abuse on their development

The involvement of model-based but not model-free learning signals during observational reward learning in the absence of choice

Power-Based Connectivity. JL Sanguinetti

Value of freedom to choose encoded by the human brain

The Neural Signature of Social Norm Compliance Manfred Spitzer, Urs Fischbacher, Bärbel Herrnberger, Georg Grön, Ernst Fehr

Estimation of Statistical Power in a Multicentre MRI study.

Figure 1. Excerpt of stimulus presentation paradigm for Study I.

SUPPLEMENTAL MATERIAL

11/24/2017. Do not imply a cause-and-effect relationship

Correlation and regression

Predictive Neural Coding of Reward Preference Involves Dissociable Responses in Human Ventral Midbrain and Ventral Striatum

Supplementary Online Content

Axiomatic methods, dopamine and reward prediction error Andrew Caplin and Mark Dean

Business Statistics Probability

Dissociating Valuation and Saliency Signals during Decision-Making

NeuroImage 76 (2013) Contents lists available at SciVerse ScienceDirect. NeuroImage. journal homepage:

The Role of Working Memory in Visual Selective Attention

Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2

Decision neuroscience seeks neural models for how we identify, evaluate and choose

Cognitive Strategies Regulate Fictive, but not Reward Prediction Error Signals in a Sequential Investment Task

Supplementary Digital Content

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Title:Atypical language organization in temporal lobe epilepsy revealed by a passive semantic paradigm

Dissociating Valence of Outcome from Behavioral Control in Human Orbital and Ventral Prefrontal Cortices

IAPT: Regression. Regression analyses

Brain Imaging studies in substance abuse. Jody Tanabe, MD University of Colorado Denver

HHS Public Access Author manuscript Eur J Neurosci. Author manuscript; available in PMC 2017 August 10.

Supplemental Information

NeuroImage 134 (2016) Contents lists available at ScienceDirect. NeuroImage. journal homepage:

Axiomatic Methods, Dopamine and Reward Prediction. Error

NORTH SOUTH UNIVERSITY TUTORIAL 2

Introduction. The Journal of Neuroscience, January 1, (1):

Supplementary Online Content

Supplementary Information. Gauge size. midline. arcuate 10 < n < 15 5 < n < 10 1 < n < < n < 15 5 < n < 10 1 < n < 5. principal principal

investigate. educate. inform.

Transcription:

Supplemental esults The full model equations are: Initialization: V i (0) = 1 (for all actions i) c i (0) = 0 (for all actions i) earning: V i (t) = V i (t - 1) + a * (r(t) - V i (t 1)) ((for chosen action i and obtained reward r(t)) V j (t) = V j (t - 1) (for non-chosen actions j) c i (t) = 1 (for chosen action i) c j (t) = d * c j (t - 1) (for non-chosen actions j) Choice: P i (t) = exp( β * (V i (t - 1) + b * c i (t - 1))) / j exp( β * (V j (t - 1) + b * c j (t-1))) where t indexes trials There are, in total, four free parameters. The learning rate a controls how sharply the model updates the expectation V i (t - 1) toward the observed reward r(t). The perseveration bias weight b controls the strength of the model's tendency to choose (for b > 0) or avoid (for b < 0) recently chosen options; the decay factor d controls over how many trials this bias persists after a choice. Finally, the softmax inverse temperature β controls the randomness of the choices: for large β, the model is more likely to choose the action believed to have the maximum value. Thus, larger β will result in a lower randomness of choices We fit the model parameters to behavior in two ways: individually (one parameter set per subject) and groupwise (one parameter set each for all earners and all Non-learners; we refer to this as a fixed effects model since it treats each parameter as fixed within the group).

As we have noted previously (Daw et al., 2006) individual parametric fits in tasks and models of this sort tend to be noisy, probably owing in part to the fact that maximum likelihood estimates have no regularization; in our experience, regularization of the parameter estimates over the population tends therefore to improve a model's subsequent fit to fmi data. Fitting the parameters as fixed within the groups is a simple form of regularization, and we therefore (following previous work; Daw et al., 2006; O'Doherty et al. 2004) used the fixed effects estimates to generate regressors for fmi. For completeness, these parameters are shown in Supplemental Table 1S, together with confidence intervals derived from an asymptotic covariance estimator (inverse Hessian of log data likelihood). To investigate behavioral differences between groups and across individuals, we use the individual fits, whose summary statistics are shown in Supplemental Table 2S, instead of the fixed effects fits. A further complication with using TD fits to compare groups is that the parametric estimates are not independent but instead tend to covary over subjects. In particular, because each trial's feedback is multiplied by the learning rate to compute the value, and this value itself is multiplied by the softmax temperature to compute logit choice probability, these two parameters tend to be inversely coupled. Therefore, as seen in the tables, the parameters viewed separately can have improbable means and large estimation errors (and neither differed significantly between groups; Wilcoxon rank sum test, p >.15). However, their product β *a tends to be more reliably estimated. Because, multiplied together, these parameters control how strongly a particular reward impacts subsequent choice preferences (i.e., logit choice probability), their product is also a more appropriate measure of trial-to-trial sensitivity to reward than either parameter individually. It is also comparable to the weight on the most recent reward in a logistic regression analysis of choices (e.g., au & Glimcher, 2005, c.f. ohrenz et al. 2007).

We therefore tested whether these trial-to-trial reward sensitivity estimates, β*a from the individual parametric fits, reflected the observed behavioral differences between the groups. Indeed, learners were significantly more sensitive to rewards on this metric than Non-learners (Wilcoxon rank sum test, p < 0.0005). We additionally examined whether these estimates correlated across subjects with the measure of behavioral performance used to classify earners and Non-learners (i.e., the number of choices on the high probability decks, 75% and 60%, in the last 40 trials of the task). Indeed, as shown in Supplemental Figure 1S, the reward sensitivity estimate correlated with the learning criterion (linear regression, r 2 =0.43, p < 0.0005; this correlation and also the between-group comparison discussed above remain significant when the outlier at the top right corner of the figure is eliminated). Of the remaining parameters, the perseveration bias decay d did not differ significantly between subjects (Wilcoxon rank sum test, p = 1), while the perseveration bias weight b did (p <.02; in fact, since b is also multiplied by β in the softmax, the parameters are similarly coupled and the difference was more reliably detected in the product β *b; p <.002). This result indicates that behaviorally earners and Non-learners differ not just in their sensitivity to rewards, but also in their tendency to revisit or avoid previously chosen options regardless of their reward history. The fit parameters suggest a tendency of earners to stick with previous choices (b > 0); Non-learners the opposite (This may simply reflect the former group's defining strategy of finding the best option and then sticking with it). Finally, we investigated whether we could detect sensitivity to rewards even in Non-learners, by re-fitting the model to this group with the learning rate(s) restricted to zero. With this restriction, rewards cannot impact choices, which can instead only be explained using choice

autocorrelation (through parameters b and d). We fit this restricted model in two ways, with the remaining free parameters either fit individually or as fixed effects over the Non-learners group. In both cases, the null hypothesis of zero learning rate(s) was rejected compared to the corresponding full model with nonzero learning rate(s) (likelihood ratio test, fixed effects: 1 d.f., p<.005, individual fits summed over group: 12 d.f., p<0.000000001). These results add additional support to the conclusion that Non-learners were sensitive to the rewards. eferences Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan J (2006) Cortical substrates for exploratory decisions in humans, Nature 441:876-879. au B, Glimcher PW (2005) Dynamic response-by-response models of matching behavior in rhesus monkeys, J Exp Anal Behav 84:555-579. ohrenz T, McCabe K, Camerer CF, Montague P (2007) Neural signature of fictive learning signals in a sequential investment task, Proc Natl Acad Sci USA 104:9493:9498. O'Doherty J, Dayan P, Schultz J, Deichmann, Friston K, Dolan J (2004) Dissociable roles of ventral and dorsal striatum in instrumental conditioning, Science 304:452-454.

earners Non-learners learning rate a.33 +/-.03.37 +/-.48 softmax inv. temp. β 2.2 +/-.15.49 +/-.26 choice weighting b 1.1 +/-.10-1.7 +/-.92 choice decay multiplier d.096 +/- 4.2e-3.70 +/-.05 β * a.75 +/-.055.18 +/-.23 Supplemental Table 1S: Parameters fit as fixed effects to group behavior (shown +/- one standard deviation from asymptotic covariance estimator). Also shown are the moments for the product of the softmax temperature and learning rate, taking into account their covariation.

earners Non-learners learning rate a.28 +/-.05.25 +/-.11 softmax inv. temp. β 7.6 +/- 2.4 1.0e+5 +/- 7.0e+4 choice weighting b.12 +/-.14-10 +/- 5.0 choice decay multiplier d.52 +/-.10.55 +/-.10 β * a 1.2 +/-.28.21 +/-.06 Supplemental Table 2S: Mean (+/- 1 SEM) over subjects of parametric fits to individual behavior. Also shown are the moments for the product of the softmax temperature and learning rate.

Supplemental Figure 1S: Scatter plot comparing two measures of behavior over subjects: the number of choices on the high probability (HP) decks (75% and 60%) in the last 40 trials of the task vs. the product of learning rate and softmax temperature (a measure of trial-to-trial sensitivity to reward) from individual parametric fits of the TD model.

Parameter estimates from Ventral Striatum for the PE regressor 7 6 5 4 3 2 1 0-1 ight Ventral Striatum earners Non-earners eft Ventral Striatum Supplemental Figure 2S: Parameter estimates from left and right ventral striatum for the PE regressor reported separately for the earner and Non-learner groups. The parameter estimates were extracted separately from the left and right striatum at the MNI coordinates of (-9, 12, -12) and (+9,12,-12) respectively.

[-3-90 -6] [-3-90 -6] Supplemental Figure 3S: Activity in visual cortical areas related to the trial onset during task performance in both earner and Non-learner groups. Highly significant responses were found in this region in both groups (A) earners; (B) Non-learners, at p<0.0001 (uncorrected). Furthermore, no significant differences in activity were found between the groups in this area in a direct contrast at p<0.001 uncorrected. These findings support the claim that subjects in both groups were processing the visual components of the task and argues against the possibility that Non-learners were simply disengaged from the task.

Brain egion aterality X Y Z Z-Score Cingulate Gyrus -18-24 39 4.31 Parahippocampus -33-18 -30 4.21 Anterior Cingulate Gyrus 18 30 24 3.97 Medial Orbital Gyrus -3 60-15 3.79 Cingulate Gyrus 12-3 36 3.66 Putamen 30-9 9 3.65 Cingulate Gyrus 18-30 39 3.62 Hippocampus 33-21 -12 3.52 Insula 36 3-12 3.48 Superior Frontal Gyrus -12 63-6 3.4 Inferior Frontal Gyrus -48 3 24 3.34 Inferior Temporal Gyrus -27-9 -30 3.32 Medial Frontal Gyrus / 0 45 24 3.27 All peaks are of clusters with an extent of min. 5 voxel s All peaks survive a significance threshold of p < 0.001 Supplemental Table 3S: egions outside of our striatal regions of interest correlating with prediction error in the contrast of earners minus Non-learners (p<0.001, uncorrected)