Reference Dependence In the Brain

Similar documents
Dopamine and Reward Prediction Error

Neuroeconomics Lecture 1

MEASURING BELIEFS AND REWARDS: A NEUROECONOMIC APPROACH ANDREW CAPLIN MARK DEAN PAUL W. GLIMCHER ROBB B. RUTLEDGE

Axiomatic methods, dopamine and reward prediction error Andrew Caplin and Mark Dean

Axiomatic Methods, Dopamine and Reward Prediction. Error

Title: Falsifying the Reward Prediction Error Hypothesis with an Axiomatic Model. Abbreviated title: Testing the Reward Prediction Error Model Class

Testing the Reward Prediction Error Hypothesis with an Axiomatic Model

Axiomatic Neuroeconomics

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain

The Neural Basis of Financial Decision Making

Is model fitting necessary for model-based fmri?

Neuroimaging vs. other methods

How rational are your decisions? Neuroeconomics

Supplementary Information

Neurobiological Foundations of Reward and Risk

Reward Systems: Human

Valence and salience contribute to nucleus accumbens activation

Understanding Minimal Risk. Richard T. Campbell University of Illinois at Chicago

Introduction to Computational Neuroscience

The individual animals, the basic design of the experiments and the electrophysiological

The theory and data available today indicate that the phasic

Lecture 9 Internal Validity

Rational Choice Theory I: The Foundations of the Theory

Exploring the Brain Mechanisms Involved in Reward Prediction Errors Using Conditioned Inhibition

Pleasure Propagation to Reward Predictors

A Scientific Model of Consciousness that Explains Spirituality and Enlightened States

Supporting Information. Demonstration of effort-discounting in dlpfc

Effects of Sequential Context on Judgments and Decisions in the Prisoner s Dilemma Game

Activation in the VTA and nucleus accumbens increases in anticipation of both gains and losses

A Model of Dopamine and Uncertainty Using Temporal Difference

Handout on Perfect Bayesian Equilibrium

Academic year Lecture 16 Emotions LECTURE 16 EMOTIONS

Inferential Brain Mapping

Emotion Explained. Edmund T. Rolls

Graphical Modeling Approaches for Estimating Brain Networks

The Rescorla Wagner Learning Model (and one of its descendants) Computational Models of Neural Systems Lecture 5.1

Dopamine, prediction error and associative learning: A model-based account

Introduction to Computational Neuroscience

CS/NEUR125 Brains, Minds, and Machines. Due: Friday, April 14

Council on Chemical Abuse Annual Conference November 2, The Science of Addiction: Rewiring the Brain

Brain Imaging studies in substance abuse. Jody Tanabe, MD University of Colorado Denver

A Brief Introduction to Bayesian Statistics

brain valuation & behavior

Experimental Testing of Intrinsic Preferences for NonInstrumental Information

[1] provides a philosophical introduction to the subject. Simon [21] discusses numerous topics in economics; see [2] for a broad economic survey.

Economics 2010a. Fall Lecture 11. Edward L. Glaeser

Paradoxes and Violations of Normative Decision Theory. Jay Simon Defense Resources Management Institute, Naval Postgraduate School

Lecture 2: Foundations of Concept Learning

Outline for the Course in Experimental and Neuro- Finance Elena Asparouhova and Peter Bossaerts

Decision Making in Robots and Autonomous Agents

BRAIN MECHANISMS OF REWARD AND ADDICTION

Brain Mechanisms Explain Emotion and Consciousness. Paul Thagard University of Waterloo

NSCI 324 Systems Neuroscience

Toward a Mechanistic Understanding of Human Decision Making Contributions of Functional Neuroimaging

EDGE DETECTION. Edge Detectors. ICS 280: Visual Perception

Associative learning

Hoare Logic and Model Checking. LTL and CTL: a perspective. Learning outcomes. Model Checking Lecture 12: Loose ends

THE BASAL GANGLIA AND TRAINING OF ARITHMETIC FLUENCY. Andrea Ponting. B.S., University of Pittsburgh, Submitted to the Graduate Faculty of

Brain Reward Pathway and Addiction

Political Science 15, Winter 2014 Final Review

Reward Hierarchical Temporal Memory

590,000 deaths can be attributed to an addictive substance in some way

SLHS1402 The Talking Brain

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

Attention Response Functions: Characterizing Brain Areas Using fmri Activation during Parametric Variations of Attentional Load

A Three Agent Model of Consciousness Explains Spirituality and Multiple Nondual Enlightened States Frank Heile, Ph.D.

Neural Correlates of Human Cognitive Function:

Multiple Switching Behavior in Multiple Price Lists

Hierarchically Organized Mirroring Processes in Social Cognition: The Functional Neuroanatomy of Empathy

Resistance to forgetting associated with hippocampus-mediated. reactivation during new learning

ABSTRACT. Title of Document: NEURAL CORRELATES OF APPROACH AND AVOIDANCE LEARNING IN BEHAVIORAL INHIBITION. Sarah M. Helfinstein, Ph.D.

doi: /brain/awr059 Brain 2011: 134; Expected value and prediction error abnormalities in depression and schizophrenia

Business Statistics Probability

Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum

Some Thoughts on the Principle of Revealed Preference 1

METHODOLOGY FOR DISSERTATION

Distinguishing informational from value-related encoding of rewarding and punishing outcomes in the human brain

Advanced Data Modelling & Inference

Altruistic Behavior: Lessons from Neuroeconomics. Kei Yoshida Postdoctoral Research Fellow University of Tokyo Center for Philosophy (UTCP)

MTAT Bayesian Networks. Introductory Lecture. Sven Laur University of Tartu

Physiology Lecture Sheet #6. Neurotransmitters

Hebbian Plasticity for Improving Perceptual Decisions

Dopamine enables dynamic regulation of exploration

PSYC 337 LEARNING. Session 6 Instrumental and Operant Conditioning Part Two

An fmri study of reward-related probability learning

Emanuela Carbonara. 31 January University of Bologna - Department of Economics

Strengthening Operant Behavior: Schedules of Reinforcement. reinforcement occurs after every desired behavior is exhibited

This module illustrates SEM via a contrast with multiple regression. The module on Mediation describes a study of post-fire vegetation recovery in

Why do we have a hippocampus? Short-term memory and consolidation

Innovation in the food and drink industry: Neuromarketing in practice & the benefits of using implicit tools

States of Curiosity Modulate Hippocampus-Dependent Learning via the Dopaminergic Circuit

Making sense of published research Two ways: Format Argument

Reward, Context, and Human Behaviour

Session 3: Dealing with Reverse Causality

The Adolescent Developmental Stage

Problems Seem to Be Ubiquitous 10/27/2018. (1) Temptation and Self Control. Behavioral Economics Fall 2018 G6943: Columbia University Mark Dean

Decision neuroscience seeks neural models for how we identify, evaluate and choose

Supplementary Methods and Results

SUPPLEMENTARY INFORMATION

Eliciting Subjective Probability Distributions. with Binary Lotteries

Transcription:

Reference Dependence In the Brain Mark Dean Behavioral Economics G6943 Fall 2015

Introduction So far we have presented some evidence that sensory coding may be reference dependent In this lecture we will present evidence that important value systems in the brain are also reference dependent Will also show how neuroscientific measurement and economic theory can be merged to answer interesting questions

The Reward Prediction Error Hypothesis Dopamine is a neurotransmitter Transmits information between brain cells Until relatively recently, assumed to be the brain s happy place Turns out to be more complicated that that Now hypothesized to encode Reward Prediction Error (RPE) Difference between experienced and predicted rewards If true, lots of interesting implications RPE used in AI learning Explicit reference dependence Makes reference point observable

Early Evidence for RPE - Monkeys Schultz et al. [1997] Dopamine fires only on receipt of unpredicted rewards Otherwise will fire at first predictor of reward If an expected reward is not received, dopamine firing will pause

Early Evidence for RPE - Humans O Doherty et al. [2003] Thirsty human subjects placed in fmri scanner Shown novel visual symbols, which signalled neutral and tasty juice rewards Assumptions made to operationalize RPE Reward: values of juice Learning: through TD algorithm Resulting RPE signal then correlated with brain activity Positive correlation with activity in Ventral Striatum taken as supporting RPE hypothesis Ventral Striatum rich in dopaminergic neurons

Problems with the Current Tests Several other theories for the role of dopamine Salience hypothesis (e.g. Zink et al. 2003) Incentive salience hypothesis (Berridge and Robinson, 1998) Agency hypothesis (Redgrave and Gurney, 2006) These theories have been hard to differentiate Couched in terms of latent variable Rewards, Beliefs, Salience, Valence not directly observable Tests rely on auxiliary assumptions - not central to the underlying theory Experiments test both underlying theory and auxiliary assumptions Also different models tend to lead to very highly correlated predictions

An Axiomatic Approach Alternative: take an axiomatic approach to testing RPE hypothesis A set of necessary and suffi cient conditions on dopamine activity Equivalent to the RPE model Easily testable Similar to Samuelson s approach to testing utility maximization Equivalent to the Weak Axiom of Revealed Preference Has several advantages Provide a complete list of testable predictions of the RPE model Non-parametric Failure of particular axioms will aid model development

The Data Set Subjects receive prizes from lotteries: Z: A space of prizes Λ : Set of all lotteries on Z Λ(z): Set of all lotteries whose support includes z e z : Lottery that gives prize z with certainty

The Data Set Observable data is a Dopamine Release Function δ : M R M = {(z, p) z Z, p (z)} δ(z, p) is dopamine activity when prize z is obtained from a lottery p

A Graphical Representation Dopamine released when prize 2 is obtained p=0.2 Prob of Prize 1 Dopamine Release Dopamine released when prize 1 is obtained

A Formal Model of RPE The difference between how good an event was expected to be and how good it turned out to be Under what conditions can we find A Predicted reward function: b : Λ R An Experienced reward function: r : Z R Such that there is an aggregator function E Represents the dopamine release function δ(z, p) = E (r(z), b(p)) Is increasing in experienced and decreasing in predicted reward. Obeys basic consistency r(z) = b(e z ) Treats no surprise consistently: Special case: E (x, x) = E (y, y) δ(z, p) = r(z) b(p)

Necessary Condition 1: Consistent Prize Ordering

Necessary Condition 1: Consistent Prize Ordering Consider two prizes, z and w Say that, when z is received from some lottery p, more dopamine is released than when w is received from p Implies higher reward for z than w Implies that z should give more dopamine than w when received for any lottery q Axiom A1: Coherent Prize Dominance for all (z, p), (w, p), (z, q), (w, q) M δ(z, p) > δ(w, p) δ(z, q) > δ(w, q)

Necessary Condition 2: Coherent Lottery Dominance

Necessary Condition 2: Consistent Lottery Ordering Consider two lotteries p and q and a prize z which is in the support of p and q Say that more dopamine is released when z is obtained from p that when it is obtained from q Implies that predicted reward of p must be lower that that of q Implies that whenever the same prize is obtained from p and q the dopamine released should be higher from lottery p than from lottery q Axiom A2: Coherent Lottery Dominance for all (z, p), (w, p), (z, q), (w, q) M δ(z, p) > δ(z, q) δ(w, p) > δ(w, q)

Necessary Condition 3: No Surprise Equivalence

Necessary Condition 3: Equivalence of Certainty Reward Prediction Error is a comparison between predicted reward and actual reward If you know exactly what you are going to get, then there is no reward prediction error This is true whatever the prize we are talking about Thus, the reward prediction error of any prize should be zero when received for sure: Axiom A3: No Surprise Equivalence δ(z, e z ) = δ(w, e w ) z, w Z

A Representation Theorem In general, these conditions are necessary, but not suffi cient for an RPE representation However, in the special case where we look only at lotteries with two prizes they are Theorem 1: If Z = 2, a dopamine release function δ satisfies axioms A1-A3 if and only if it admits an RPE representation Thus, in order to test RPE in case of two prizes, we need only to test A1-A3

Aim Generate observations of δ in order to test axioms Use a data set containing: Two prizes: win $5, lose $5 Five lotteries: p {0, 0.25, 0.5, 0.75.1} Do not observe dopamine directly Use fmri to observe activity in the Nucleus Accumbens Brain area rich in dopaminergic neurons

Experimental Design

Experimental Details 14 subjects (2 dropped for excess movement) Practice Session (outside scanner) of 4 blocks of 16 trials 2 Scanner Sessions of 8 blocks of 16 trials For Scanner Sessions, subjects paid $35 show up fee, + $100 endowment + outcome of each trial In each trial, subject offered one option from Observation Set and one from a Decoy Set

Constructing Delta Defining Regions of Interest Need to determine which area of the brain is the Nucleus Accumbens Two ways of doing so: Anatomical ROIs: Defined by location Functional ROIs: Defined by response to a particular stimulus We concentrate on anatomical ROI, but use functional ROIs to test results

Constructing Delta Anatomical Regions of Interest [Neto et al. 2008]

Constructing Delta Estimating delta We now need to estimate the function δ using the data Use a between-subject design Treat all data as coming from a single subject Create a single time series for an ROI Average across voxels Convert to percentage change from session baseline Regress time series on dummies for the revelation of each prize/lottery pair δ(x, p) is the estimated coeffi cient on the dummy which takes the value 1 when prize x is obtained from lottery p

Results

Results Axioms hold Nucleus Accumbens activity in line with RPE model Experienced and predicted reward sensible

Time Paths

Early Period

Late Period

Two Different Signals?

Key Results fmri activity in Nucleus Accumbens does satisfy the necessary conditions for an RPE encoder However, this aggregate result may be the amalgamation of two separate signals Vary in temporal lag Vary in magnitude

Where Now? Observing beliefs and rewards? Axioms + experimental results tell us we can assign numbers to events such that NAcc activity encodes RPE according to those numbers Can we use these numbers to make inferences about beliefs and rewards? Are they beliefs and rewards in the sense that people usually use the words? Can we find any external validity with respect to other observables? Behavior? Obviously rewarding events? Can we then generalize to other situations?

Economic Applications New way of observing beliefs Makes surprise directly observable Insights into mechanisms underlying learning Building blocks of utility

Conclusion We provide evidence that NAcc activity encodes RPE Can recover consistent dopaminergic beliefs and rewards Potential for important new insights into human behavior and state of mind

What about other brain areas? Anterior Insula seems to record magnitude of RPE