Intelligence moderates reinforcement learning: a mini-review of the neural evidence

Similar documents
Decision neuroscience seeks neural models for how we identify, evaluate and choose

Neurobiological Foundations of Reward and Risk

What can we do to improve the outcomes for all adolescents? Changes to the brain and adolescence-- Structural and functional changes in the brain

Distinct valuation subsystems in the human brain for effort and delay

The Frontal Lobes. Anatomy of the Frontal Lobes. Anatomy of the Frontal Lobes 3/2/2011. Portrait: Losing Frontal-Lobe Functions. Readings: KW Ch.

Brain Imaging studies in substance abuse. Jody Tanabe, MD University of Colorado Denver

There are, in total, four free parameters. The learning rate a controls how sharply the model

Resistance to forgetting associated with hippocampus-mediated. reactivation during new learning

Hierarchical Control over Effortful Behavior by Dorsal Anterior Cingulate Cortex. Clay Holroyd. Department of Psychology University of Victoria

"False tagging mechanism False Tagging Theory All idea initially believed Doubt occur when prefrontal cortex tags it as false Provides doubt and

Toward a Mechanistic Understanding of Human Decision Making Contributions of Functional Neuroimaging

The Role of Orbitofrontal Cortex in Decision Making

Cognition in Parkinson's Disease and the Effect of Dopaminergic Therapy

How rational are your decisions? Neuroeconomics

5th Mini-Symposium on Cognition, Decision-making and Social Function: In Memory of Kang Cheng

Supporting Information. Demonstration of effort-discounting in dlpfc

Connect with amygdala (emotional center) Compares expected with actual Compare expected reward/punishment with actual reward/punishment Intuitive

correlates with social context behavioral adaptation.

Methods to examine brain activity associated with emotional states and traits

Neural Basis of Decision Making. Mary ET Boyle, Ph.D. Department of Cognitive Science UCSD

Double dissociation of value computations in orbitofrontal and anterior cingulate neurons

Mapping Neural circuitry of Risk and Resilience for Suicidal Behavior and Mood Disorders

Title of file for HTML: Supplementary Information Description: Supplementary Figures, Supplementary Tables and Supplementary References

Remembering the Past to Imagine the Future: A Cognitive Neuroscience Perspective

Cognitive Neuroscience Section 8

Attention-deficit/hyperactivity disorder (ADHD) is characterized

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain

Neural Basis of Decision Making. Mary ET Boyle, Ph.D. Department of Cognitive Science UCSD

Is model fitting necessary for model-based fmri?

The Inherent Reward of Choice. Lauren A. Leotti & Mauricio R. Delgado. Supplementary Methods

Emotion Explained. Edmund T. Rolls

Maltreatment, brain development and the law: Towards an informed developmental framework

APNA 25th Annual Conference October 19, Session 1022

New Research on ECT and development of neuromodulation for treatment of depression

The Adolescent Developmental Stage

Opinion This Is Your Brain on Politics

Biomarkers Workshop In Clinical Trials Imaging for Schizophrenia Trials

Overt vs. Covert Responding. Prior to conduct of the fmri experiment, a separate

Time Experiencing by Robotic Agents

Implications of a Dynamic Causal Modeling Analysis of fmri Data. Andrea Stocco University of Washington, Seattle

The neural mechanisms of inter-temporal decision-making: understanding variability

The Neural Basis of Economic Decision- Making in The Ultimatum Game

Biological Risk Factors

The roles of valuation and reward processing in cognitive function and psychiatric disorders

Right lateral prefrontal cortex Specificity for inhibition or strategy use?

The role of theta oscillations in memory and decision making

Basic definition and Classification of Anhedonia. Preclinical and Clinical assessment of anhedonia.

Reward Systems: Human

BINGES, BLUNTS AND BRAIN DEVELOPMENT

Supplementary Information

Mathematical models of visual category learning enhance fmri data analysis

Neuro-cognitive systems underpinning antisocial behavior and the impact of maltreatment and substance abuse on their development

A model of the interaction between mood and memory

Role of the ventral striatum in developing anorexia nervosa

Gender Sensitive Factors in Girls Delinquency

Working Memory: Critical Constructs and Some Current Issues. Outline. Starting Points. Starting Points

Prefrontal cortex. Executive functions. Models of prefrontal cortex function. Overview of Lecture. Executive Functions. Prefrontal cortex (PFC)

Psychosocial intervention to optimal treatment to patients with schizophrenia: neurocognitive perspectives

Reviews and Overviews. Mechanisms of Psychiatric Illness

An Upside to Reward Sensitivity: The Hippocampus Supports Enhanced Reinforcement Learning in Adolescence

Psych3BN3 Topic 4 Emotion. Bilateral amygdala pathology: Case of S.M. (fig 9.1) S.M. s ratings of emotional intensity of faces (fig 9.

The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans

Theory of mind skills are related to gray matter volume in the ventromedial prefrontal cortex in schizophrenia

The two sides of human thought. Human thinking: Lessons from Neuroscience. Patient studies. Kalina Christoff Vancouver, BC May 29, 2007

Distinguishing informational from value-related encoding of rewarding and punishing outcomes in the human brain

Reward, Context, and Human Behaviour

Supplemental information online for

Joseph T. McGuire. Department of Psychological & Brain Sciences Boston University 677 Beacon St. Boston, MA

Dissociating Valence of Outcome from Behavioral Control in Human Orbital and Ventral Prefrontal Cortices

THE PREFRONTAL CORTEX. Connections. Dorsolateral FrontalCortex (DFPC) Inputs

Online appendices are unedited and posted as supplied by the authors. SUPPLEMENTARY MATERIAL

Talk 2. Neurocognitive differences in children with or without CU- traits 05/12/2013. Psychological. Behavioural

Activity in Inferior Parietal and Medial Prefrontal Cortex Signals the Accumulation of Evidence in a Probability Learning Task

Overlapping neural systems mediating extinction, reversal and regulation of fear

Prefrontal dysfunction in drug addiction: Cause or consequence? Christa Nijnens

HHS Public Access Author manuscript Eur J Neurosci. Author manuscript; available in PMC 2017 August 10.

Choosing the Greater of Two Goods: Neural Currencies for Valuation and Decision Making

Supplementary materials for: Executive control processes underlying multi- item working memory

590,000 deaths can be attributed to an addictive substance in some way

Inception, Total Recall, & The Brain: An Introduction to Neuroscience Part 2. Neal G. Simon, Ph.D. Professor Dept. of Biological Sciences

Neural activity to positive expressions predicts daily experience of schizophrenia-spectrum symptoms in adults with high social anhedonia

Memory, Attention, and Decision-Making

Neuroanatomy of Emotion, Fear, and Anxiety

The Somatic Marker Hypothesis: Human Emotions in Decision-Making

Motor Systems I Cortex. Reading: BCP Chapter 14

Contributions of the Amygdala to Reward Expectancy and Choice Signals in Human Prefrontal Cortex

Mindfulness and compassion as the foundations of well-being Saturday 7 October am

Cognition and Psychopathology Fall 2009

This paper has been published as: Crone, E. A., & Steinbeis, N. (2017). Neural perspectives on cognitive control development during childhood and

INTRODUCTION TO EDUCATIONAL NEUROSCIENCE

Instrumental Conditioning VI: There is more than one kind of learning

HST.583 Functional Magnetic Resonance Imaging: Data Acquisition and Analysis Fall 2006

Title. Author(s)Takahashi, Taiki. CitationMedical Hypotheses, 65(4): Issue Date Doc URL. Type. File Information

Supporting Information

Psych 136S Review Questions, Summer 2015

Prefrontal Response and Frontostriatal Functional Connectivity to Monetary Reward in Abstinent Alcohol- Dependent Young Adults

Time perception, cognitive correlates, age and emotions

Altruistic Behavior: Lessons from Neuroeconomics. Kei Yoshida Postdoctoral Research Fellow University of Tokyo Center for Philosophy (UTCP)

The Neuroscience of Addiction: A mini-review

A Computational Model of Complex Skill Learning in Varied-Priority Training

Transcription:

Articles in PresS. J Neurophysiol (September 3, 2014). doi:10.1152/jn.00600.2014 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Neuro Forum Intelligence moderates reinforcement learning: a mini-review of the neural evidence Chong Chen 1 1 Department of Psychiatry, Hokkaido University Graduate School of Medicine, Sapporo 060-8638, Japan. Tel: (81)11-706-5973; Fax: (81)11-706- 5081; E-mail: cchen@med.hokudai.ac.jp Keywords: Intelligence; Reinforcement learning; Prediction error; Modelbased Acknowledgements: I thank Peter Dayan and Nathaniel Daw for their shared ideas and discussion, and Atsuhito Toyomaki for his comments on the previous manuscript. 1 Copyright 2014 by the American Physiological Society.

47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 Abstract: Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely how intelligence affects neural responses during reinforcement learning remains uninvestigated. A mini-review of three existing studies suggests that, higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence. Computation theory of reinforcement learning (RL) has been one of the most frequently used instruments for explaining decision making and learning (Daw et al. 2011; Smittenaar et al. 2013). RL, the process of optimizing reward under the reinforcement of prediction error (PE), the difference between received and expected value, constitutes a major form of experience learning essential for all human strivings. It was discovered that dopamine neuron signals PE, so that its activity is enhanced by positive PE but suppressed by negative PE. Subjects would use these PE signals to update value functions and guide their future behaviors. This type of RL is termed model-free or habitual/automatic RL, since subjects rely entirely on PE, i.e. the historical trial and error experience to guide learning. In contrast, in model-based or goal-directed/controlled RL, upon gating signals of PE, subjects use higher order cognitive maps or learned models to predict future outcomes, thus could update value functions more flexibly (Daw et al. 2011; Smittenaar et al. 2013). Recent neuroimaging studies have greatly elucidated the neural basis of RL, in which model-free RL typically involves dopamine-rich striatum, whereas model-based RL also extends to dorsolateral prefrontal cortex (PFC) (Smittenaar et al. 2013), medial PFC (Daw et al. 2011), anterior cingulate cortex (ACC) and orbitofrontal cortex (Rushworth et al. 2012), etc. Notably, most of these areas have also been linked to intelligence, another key factor contributing to human strivings. The volume of left striatum is positively correlated with IQ (MacDonald et al. 2014), while dorsolateral and medial PFC and ACC together with several other parieto-frontal areas constitute the structural and functional substrates of intelligence (Deary et al. 2010). Indeed, further investigation does reveal a promising association between intelligence and RL. It has been suggested that RL may reflect stable individual difference as a trait (Cohen 2007). It is most likely that intelligence accounts for this stable trait, given 2

82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 intelligence consistently predicts performance in almost all fields of human strivings, especially learning and education (Nisbett 2009). Further, intelligence is commonly perceived as consisting of crystallized (the whole stored knowledge which can be used to solve problems) and fluid intelligence (the acquired ability to solve novel problems that depend little on whole knowledge) (Nisbett 2009), both of which could be used to construct cognitive maps and learned models, thus contributing to model-based RL. In light of the above reasoning, the purpose of this article is to review three functional magnetic resonance imaging studies (Van den Bos et al. 2012; Schlagenhauf et al. 2013; Hawes et al. 2014) that have examined the link between intelligence and RL. General findings Van den Bos et al. (2012) studied a sample of adolescents (age 13-16 years, mean=14.39; 23 male, 22 female; IQ 70-130, as measured by the Similarities and Block design subscales of Wechsler Intelligence Scale for Children) using a probabilistic learning task. The task included 50 AB and 50 CD trials and the feedback was probabilistic in that choosing A led to positive feedback in 80% of AB trials, while choosing B led to only 20%. Similarly, choosing C and D led to positive feedback in 70% and 30 % of CD trials, respectively. Since choosing A or C led to positive feedback more often, receiving negative feedback after choosing A or C would be unexpected and generate a negative PE. As choosing B or D led to negative feedback more often, receiving positive feedback after choosing B or D would generate a positive PE. The authors then examined the blood oxygen level-dependent (BOLD) signals after positive and negative PEs. They found that higher IQ was related to accentuated activation in right dorsolateral PFC and dorsal ACC following positive PEs, however this was not the case for negative PEs. Further, educational level moderated the correlation, such that dorsolateral PFC activity was correlated with IQ in both prevocational (mean IQ 91.3±2.4; r=0.54) and pre-university (mean IQ 107±2.0; r=0.59) subjects, but dorsal ACC activity was only correlated with IQ in pre-university subjects (r=0.44). Behaviorally, the authors observed that IQ was positively correlated with the number of correct choices in the last 60 trials (r=0.48). In addition, IQ was positively correlated with win-stay choice strategy after expected positive feedback (r=0.54) and negatively 3

113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 correlated with it after unexpected positive feedback (r=-0.43). Clearly, after recognized the rules of the task, staying the same choice after expected positive feedback and shifting the choice after unexpected positive feedback would be a more optimal strategy. Schlagenhauf et al. (2013) used a reversal learning task which is frequently used to generate PEs. Subjects (age 22-61, mean 36.9±12.4; 28 male) were asked to choose one of the two abstract targets for 200 trials, reward and loss were determined by three types of rules. In rule type 1, a reward was delivered if less than 80% of the chosen target had been rewarded, and a punishment occurred otherwise. In rule type 2 the probability reversed to 20%, while in rule type 3 it switched to 50%. The type of rules changed after 16 trials or 10 trials if subjects have made 70% correct choices. A temporal sequence of PEs was calculated for each subject and BOLD neuronal activity was modeled with trial-by-trial PE as the modulator. In this way they extracted the mean neural PE signals from bilateral ventral striatum. Fluid IQ was derived from factor analysis of nine tests targeting cognitive speed, attention and executive function, working memory, episodic memory and reasoning. Crystallized IQ was measured with a verbal knowledge test. The hidden rules of the complex reversal learning task might inhibit those with higher IQ to fulfill their potential thus achieve better performance, as the authors failed to find associations between IQ scores and correct responses. However, they did find significant positive correlation between fluid IQ and mean BOLD PE signal in bilateral ventral striatum, even after controlling age. Further analysis revealed that attention and reasoning underlay this correlation. Since the authors extracted the mean PE signals without differentiating positive and negative PEs, this result might result from a positive correlation between IQ and striatal signals of positive PE, and/or a negative correlation between IQ and striatal signals of negative PE. Unlike the previous two studies, Hawes et al. (2014) employed a manipulated task so that each subject (18-38 years old, mean 22; 94 males; IQ 95.5-148.0, measured by the Wechsler Abbreviated Scale of Intelligence) underwent the same reinforcement history. Subjects had to guess whether an upcoming number would be low (1, 2, 3) or high (4, 5, 6), and received $2 reward for each correct guess and -$1 punishment for each incorrect guess. Unknown to subjects however, the computer responded in a way that each subject received the same sequence of 20 rewards and 20 losses. 4

145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 The authors found that %BOLD changes in striatum (caudate) after loss was positive predicted by IQ (beta=0.87) (less negative %BOLD change with higher IQ), which didn t change significantly after controlling caudate volume, age and working memory performance. Further, %BOLD change following loss was correlated with IQ in posterior cingulate cortex (r=0.20, p<0.051, which should be taken with caution due to statistical issue), whereas %BOLD changes following reward and loss were both correlated with IQ in ventromedial PFC and left inferior frontal cortex (r ranges from 0.30 to 0.36). Given the pseudorandom nature of the task, we simply have no idea what predictions would subjects have made. Nevertheless, since subjects would have been expecting reward before receiving feedback, reward feedback would generate a positive PE, and loss feedback would generate a negative PE. The findings by Hawes et al. (2014) thus could be interpreted as, more intelligent subjects showed enhanced neural signals of positive PE in ventromedial PFC and left inferior frontal cortex, and lessened neural signals of negative PE in ventromedial PFC, left inferior frontal cortex, striatum and posterior cingulate cortex. Alternatively, these BOLD signals might reflect subjects emotional response. Both ventromedial PFC and striatum are involved in the receipt of reward and loss, the activity of which may reflect the perceived value magnitude as well as resulted subjective experience (Diekhof et al. 2012). Moreover, posterior cingulate cortex is also implicated in happy and sad mood (Nielsen et al. 2005). It is likely that intelligence enhances positive emotions following reward and buffer negative emotions following loss. The latter is consistent with the observation that lower IQ exposes individuals to higher risk of developing posttraumatic stress disorder (Bomyea et al. 2012). However, this interpretation in terms of emotion is not in conflict with that of PE, as emotion and PE may coexist. Behaviorally, Hawes et al. (2014) demonstrated that higher IQ subjects considered more historical information. Specifically, more intelligent subjects were influenced by feedbacks one and two periods back, i.e. they tended to guess three combination (highhigh-high, high-high-low, etc.) in three trials in a row as the potential rule of the task, while less intelligent subjects were primarily influenced by feedback only one period back, i.e. they tended to guess two combination (high-high, high-low, low-high, low- 5

177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 low) in two trials in order as the rule. IQ & RL: a summary of the neural findings Despite employing different tasks among different populations, the three studies provided pioneering insights into the role of intelligence in RL. Specifically, Van den Bos et al. (2012) demonstrated that higher IQ was associated with accentuated activation in right dorsolateral PFC and dorsal ACC following positive PEs, especially in more intelligent subjects, while Schlagenhauf et al. (2013) and Hawes et al. (2014) suggested that higher IQ may be associated with enhanced neural signals following positive PE in striatum, ventromedial PFC, and left inferior frontal cortex; and lessened neural signals of negative PE (less reduced activation) in striatum, ventromedial PFC, left inferior frontal cortex, and posterior cingulate cortex. Explanation & implication Given the tasks used by these studies, we couldn t differentiate model-free and model-based RL in the above findings (Daw et al. 2011). Further, though originally it was proposed that striatal PE signals reflect exclusively model-free RL, more recent research revealed that striatum encodes model-based RL as well (Daw et al. 2011). In contrast, signals in dorsolateral and medial PFC and dorsal ACC may indicate modelbased RL (Daw et al. 2011; Rushworth et al. 2012; Smittenaar et al. 2013). Thus the above reviewed findings suggest that intelligence enhances model-based RL, though it may also improve model-free RL, confirming the observation that RL reflects stable individual difference as a trait (Cohen 2007). In other words, the enhanced brain activation following positive PEs and less reduced activation following negative PEs may reflect the fact that higher IQ subjects especially those with higher fluid IQ were actively processing information to construct cognitive maps using model-based RL. This is especially true in the context of the following literature. Dorsolateral PFC contributes to working memory, while ACC monitors conflict (Deary et al. 2010; Van den Bos et al. 2012). Left inferior frontal cortex is related to semantic search and selection among competitive representations (Bookheimer 2002), whereas posterior cingulate cortex encodes and retrieves episodic memory (Nielsen et al. 2005). This explanation fits well with the fact that when facing complex and difficult 6

208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 problems people with higher IQ generally make more effort and show higher brain activation (Deary et al. 2010). It is also in line with and may well explain the behaviors observed by Van den Bos et al. (2012) that more intelligent subjects achieved better performance and showed more optimal shifting behaviors after receiving positive feedbacks, and by Hawes et al. (2014) that more intelligent subjects considered more historical information. Moreover, based on the association between IQ and striatal signals, it is also likely that intelligence amplifies model-free RL thus both positive and negative PEs, but the enhanced negative PEs were buffered or even overrode by model-based RL, resulting into the distinct findings regarding positive and negative PEs. Finally, a most recent study (Lee et al. 2014) showed that inferior frontal cortex encodes the reliability of PE signals and may act as an arbitrator determining whether model-free or model-based RL takes control. Consequently the accentuated activation of left inferior frontal cortex in more intelligent subjects found by Hawes et al. (2014) suggests a possibility that intelligence enhances this arbitration process. Outlook Though limited, converging evidence supports a promising role of intelligence in RL. It is stimulating for future studies to confirm and further elucidate this role. As a major concern, to clarify and differentiate the effect of intelligence on model-free and modelbased RL, future studies should use tasks that dissociating these two types of RL (Daw et al. 2011). Further, since intelligence may affect positive and negative PEs in a different way, it is also preferable to analyze them separately. Finally, as the reviewed studies were correlational in nature, to reveal the causality future research could employ experimental manipulation of neural signals or intelligence training (Nisbett 2009). References Bomyea J, Risbrough V, Lang AJ. A consideration of select pre-trauma factors as key vulnerabilities in PTSD. Clin Psychol Rev 32(7):630-41, 2012 Bookheimer S. Functional MRI of language: new approaches to understanding the cortical organization of semantic processing. Annu Rev Neurosci 25:151-88, 2002 Cohen MX. Individual differences and the neural representations of reward expectation and reward prediction error. Soc Cogn Affect Neurosci 2(1):20-30, 2007 Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans' choices and striatal prediction errors. Neuron 69(6):1204-15, 2011 Deary IJ, Penke L, Johnson W. The neuroscience of human intelligence differences. Nat Rev Neurosci 11(3):201-11, 2010 7

243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 Diekhof EK, Kaps L, Falkai P, Gruber O. The role of the human ventral striatum and the medial orbitofrontal cortex in the representation of reward magnitude - an activation likelihood estimation meta-analysis of neuroimaging studies of passive reward expectancy and outcome processing. Neuropsychologia 50(7):1252-66, 2012 Hawes DR, DeYoung CG, Gray JR, Rustichini A. Intelligence moderates neural responses to monetary reward and punishment. J Neurophysiol 111(9):1823-32, 2014 Lee SW, Shimojo S, O'Doherty JP. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81(3):687-99, 2014 MacDonald PA, Ganjavi H, Collins DL, Evans AC, Karama S. Investigating the relation between striatal volume and IQ. Brain Imaging Behav 8(1):52-9, 2014 Nielsen FA, Balslev D, Hansen LK. Mining the posterior cingulate: segregation between memory and pain components. Neuroimage 27(3):520-32, 2005 Nisbett RE. Intelligence and how to get it: why schools and cultures count. New York: W.W. Norton & Co., 2009 Rushworth MF, Kolling N, Sallet J, Mars RB. Valuation and decision-making in frontal cortex: one or many serial or parallel systems? Curr Opin Neurobiol 22(6):946-55, 2012 Schlagenhauf F, Rapp MA, Huys QJ, Beck A, Wüstenberg T, Deserno L, Buchholz HG, Kalbitzer J, Buchert R, Bauer M, Kienast T, Cumming P, Plotkin M, Kumakura Y, Grace AA, Dolan RJ, Heinz A. Ventral striatal prediction error signaling is associated with dopamine synthesis capacity and fluid intelligence. Hum Brain Mapp 34(6):1490-9, 2013 Smittenaar P, FitzGerald TH, Romei V, Wright ND, Dolan RJ. Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron 80(4):914-9, 2013 Van den Bos W, Crone EA, Guroglu B. Brain function during probabilistic learning in relation to IQ and level of education. Developmental Cognitive Neuroscience 2(Suppl.), S78 S89, 2012 8