Instrumental Conditioning VI: There is more than one kind of learning

Similar documents
Still at the Choice-Point

ISIS NeuroSTIC. Un modèle computationnel de l amygdale pour l apprentissage pavlovien.

The Contribution of the Amygdala to Reward-Related Learning and Extinction

Lecture 6 Todd M. Gureckis

Effects of lesions of the nucleus accumbens core and shell on response-specific Pavlovian i n s t ru mental transfer

Provided for non-commercial research and educational use only. Not for reproduction, distribution or commercial use.

The orbitofrontal cortex, predicted value, and choice

BIOMED 509. Executive Control UNM SOM. Primate Research Inst. Kyoto,Japan. Cambridge University. JL Brigman

THE FORMATION OF HABITS The implicit supervision of the basal ganglia

Vicarious trial and error

by popular demand: Addiction II

Multiple Forms of Value Learning and the Function of Dopamine. 1. Department of Psychology and the Brain Research Institute, UCLA

Habits, action sequences and reinforcement learning

PSY/NEU338: Animal learning and decision making: Psychological, computational and neural perspectives

Learning = an enduring change in behavior, resulting from experience.

The Contribution of the Medial Prefrontal Cortex, Orbitofrontal Cortex, and Dorsomedial Striatum to Behavioral Flexibility

Outline. History of Learning Theory. Pavlov s Experiment: Step 1. Associative learning 9/26/2012. Nature or Nurture

Goals and Habits in the Brain

REVIEW Neural and psychological mechanisms underlying compulsive drug seeking habits and drug memories indications for novel treatments of addiction*

Neural systems of reinforcement for drug addiction: from actions to habits to compulsion

Parallel incentive processing: an integrated view of amygdala function

Habit formation. Citation. As Published Publisher. Version

Hierarchical Control over Effortful Behavior by Dorsal Anterior Cingulate Cortex. Clay Holroyd. Department of Psychology University of Victoria

THE PERSISTENCE OF MALADAPTIVE MEMORY: ADDICTION, DRUG

Associative learning

ALG Christmas Workshop 2012

Behaviorism: Laws of the Observable

Learning, Reward and Decision-Making

Chapter 5: How Do We Learn?

Chapter 1: Behaviorism and the beginning of cognitive science

PSY402 Theories of Learning. Chapter 8, Theories of Appetitive and Aversive Conditioning

Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

Behavioral Neuroscience: Fear thou not. Rony Paz

THE ROLE OF DOPAMINE IN THE DORSAL STRIATUM AND ITS CONTRIBUTION TO THE REVERSAL OF HABIT-LIKE ETHANOL SEEKING. Holton James Dunville

Fronto-executive functions in rodents: neural and neurochemical substrates

NIH Public Access Author Manuscript Brain Res. Author manuscript; available in PMC 2014 May 20.

Study Plan: Session 1

Overview. Non-associative learning. Associative Learning Classical conditioning Instrumental/operant conditioning. Observational learning

Role of the anterior cingulate cortex in the control over behaviour by Pavlovian conditioned stimuli

Provided for non-commercial research and educational use. Not for reproduction, distribution or commercial use.

Behavioral Neuroscience: Fear thou not. Rony Paz

Dikran J. Martin Introduction to Psychology

Lecture 16 (Nov 21 st ): Emotional Behaviors Lecture Outline

Contributions of the striatum to learning, motivation, and performance: an associative account

Learning. Learning. Stimulus Learning. Modification of behavior or understanding Is it nature or nurture?

Chapter 5: Learning and Behavior Learning How Learning is Studied Ivan Pavlov Edward Thorndike eliciting stimulus emitted

BRAIN MECHANISMS OF REWARD AND ADDICTION

Brain anatomy and artificial intelligence. L. Andrew Coward Australian National University, Canberra, ACT 0200, Australia

Instrumental uncertainty as a determinant of behavior under interval schedules of reinforcement

How cocaine impairs flexible behavior: A neuroscience perspective. By Heather Ortega

Learning: Some Key Terms

Learning. Learning is a relatively permanent change in behavior acquired through experience or practice.

Do Reinforcement Learning Models Explain Neural Learning?

A framework for studying the neurobiology of value-based decision making

1 Emotion and Motivation. This is the last lecture in this series so please fill in a feedback form. Thank you. I hope you ve enjoyed them.

Basic Nervous System anatomy. Neurobiology of Happiness

Encyclopedia of Addictive Behaviours: Article Template

Innate behavior & Learning

Taken From The Brain Top to Bottom //

NEURONAL ENCODING OF HABIT-LIKE ALCOHOL SELF-ADMINISTRATION IN THE RAT DORSAL STRIATUM. Rebecca Richer Fanelli

nucleus accumbens septi hier-259 Nucleus+Accumbens birnlex_727

Effects of limbic corticostriatal lesions on a u t o shaping performance in rats

Habit formation coincides with shifts in reinforcement representations in the sensorimotor striatum

2. Which of the following is not an element of McGuire s chain of cognitive responses model? a. Attention b. Comprehension c. Yielding d.

Reinforcement learning, conditioning, and the brain: Successes and challenges

Modeling the sensory roles of noradrenaline in action selection

Learning: Chapter 7: Instrumental Conditioning

What do you notice?

brain valuation & behavior

Myers PSYCHOLOGY. (7th Ed) Chapter 8. Learning. James A. McCubbin, PhD Clemson University. Worth Publishers

Psych3BN3 Topic 4 Emotion. Bilateral amygdala pathology: Case of S.M. (fig 9.1) S.M. s ratings of emotional intensity of faces (fig 9.

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain

Unit 6 Learning.

Psyc 3705, Cognition--Introduction Sept. 13, 2013

DEFINITION. Learning is the process of acquiring knowledge (INFORMATIN ) and new responses. It is a change in behavior as a result of experience

Council on Chemical Abuse Annual Conference November 2, The Science of Addiction: Rewiring the Brain

PSYCHOLOGY. Chapter 6 LEARNING PowerPoint Image Slideshow

Chapter 4. Role of the nucleus accumbens core and shell in Pavlovian instrumental transfer

Progress in Neurobiology

OUTLINE OF PRESENTATION

Bridging Research and Practice: Our Clients Who Are Adopted and their Families. Norman E. Thibault, PhD, LMFT

Emotion Explained. Edmund T. Rolls

Draft 2 Neuroscience DRAFT work in progress

NST II Psychology NST II Neuroscience (Module 5)

PSY 402. Theories of Learning Chapter 1 What is Learning?

PAST SUBSTANCE USE DISORDER IS ASSOCIATED WITH A MORE RAPID TRANSITION FROM GOAL-DIRECTED TO HABITUAL RESPONDING. Theresa H. McKim

PSYC2010: Brain and Behaviour

Identification of an Operant Learning Circuit by Whole Brain Functional Imaging in Larval Zebrafish

THE AMYGDALA AND REWARD

Behavioral Pharmacology

Bronze statue of Pavlov and one of his dogs located on the grounds of his laboratory at Koltushi Photo taken by Jackie D. Wood, June 2004.

Addiction Therapy-2014

Review Sheet Learning (7-9%)

Chapter 5 Study Guide

Learning. 3. Which of the following is an example of a generalized reinforcer? (A) chocolate cake (B) water (C) money (D) applause (E) high grades

Overlapping neural systems mediating extinction, reversal and regulation of fear

The Effects of Temporary Inactivation of the Orbital Cortex in the Signal Attenuation Rat Model of Obsessive Compulsive Disorder

Associative Learning

General introduction. Chapter 1

Association. Operant Conditioning. Classical or Pavlovian Conditioning. Learning to associate two events. We learn to. associate two stimuli

Transcription:

Instrumental Conditioning VI: There is more than one kind of learning PSY/NEU338: Animal learning and decision making: Psychological, computational and neural perspectives outline what goes into instrumental associations? goal directed versus habitual behavior neural dissociations between habitual and goaldirected behavior how does all this fit in with reinforcement learning? 2

what is associated with what? Thorndike: S Skinner: what is the S? Tolman: S cognitive map reinforcer 3 Tolman The stimuli are not connected by just simple one-to-one switches to the outgoing responses. ather, the incoming impulses are usually worked over and elaborated in the central control room into a tentative, cognitive-like map of the environment. And it is this tentative map, indicating routes and paths and environmental relationships, which finally determines what responses, if any, the animal will finally release. Tolman: S cognitive map 4

Maze task train rats to find food in a maze second group: exposed to maze but without food compare the groups in subsequent test with food what do you think will happen? what does this demonstrate? 5 Maze task: Latent learning reward introduced reward introduced continuous reward Blodgett (1929) 6

another example: shortcuts training: test: result: Tolman et al (1946) 7 summary so far... Even the humble rat can learn & internally represent spatial structure, and use it to plan flexibly Tolman relates this to all of society Note that spatial tasks are really complicated & hard to control Next: search for modern versions of these effects Key question: is S- model ever relevant? and what is there beyond it? (especially important given what we know about L) 8

the modern debate: S- vs -O S- theory: parsimonious - same theory for Pavlovian conditioning (CS associated with C) and instrumental conditioning (stimulus associated with response) but: the critical contingency in instrumental conditioning is that of the response and the outcome alternative: -O theory (also called A-O) among proponents: escorla, Dickinson same spirit as Tolman (know map of contingencies and desires, can put 2+2 together) How would you test this? 9 1 - Training: outcome devaluation 2 Pairing with illness: 2 Motivational shift: Unshifted Non-devalued Hungry Q1: why test without rewards? Q2: what do you think will happen? Q3: what would Tolman/Thorndike guess? Sated 3 Test: (extinction)??? will animals work for food they don t want? 10

devaluation: results outcomesensitive goal directed like -O presses per minute 10 5 0 moderate training extensive training Non-devalued Devalued outcomeinsensitive habitual like S- Animals will sometimes work for food they don t want! in daily life: actions become automatic (habitual) with repetition Holland (2004) 11 outline what goes into instrumental associations? goal directed versus habitual behavior neural dissociations between habitual and goaldirected behavior how does all this fit in with reinforcement learning? 12

devaluation: results from lesions I overtrained rats animals with lesions to DLS never develop habits despite extensive training also treatments depleting dopamine in DLS also lesions to infralimbic division of PFC (same corticostriatal loop) dorsolateral striatum lesion control (sham lesion) Yin et al (2004) 13 devaluation: results from lesions II overtrained rats after habits have been formed, devaluation sensitivity can be reinstated by temporary inactivation of IL PFC IL PFC inactivation (muscimol) control Coutureau & Killcross (2003) 14

devaluation: results from lesions III lesions of the pdms cause animals to leverpress habitually even with only moderate training Yin, Ostlund, Knowlton & Balleine (2005) 15 devaluation: results from lesions IV moderate training control devalued Prelimbic (PL) PFC lesions cause animals to leverpress habitually even with only moderate training (also dorsomedial PFC and mediodorsal thalamus (same loop)) Killcross & Coutureau (2003) 16

complex picture of behavioral control a sensorimotor infralimbic (IL) cortex prelimbic (PL) orbitofrontal (OFC) thalamus VA MD amygdala CeN BLA via GP via GP hippocampus DLS DMS dorsal striatum core shell n. accumbens SNc VTA brainstem hypothalamus Habenula, SC PPTN, etc. neural dissociation between goaldirected and habitual controllers inspired by Balleine (2005) 17 what does all this mean? The same action (leverpressing) can arise from two psychologically & neurally dissociable pathways 1. moderately trained behavior is goal-directed : dependent on outcome representation, like cognitive map (also associated with hippocampus - literal or abstract map of environment) 2. overtrained behavior is habitual : apparently not dependent on outcome representation, like S- S- habits really do exist, they just don t describe all of animal behavior Lesions suggest two parallel systems, in that the intact one can apparently support behavior at any stage 18

devaluation: one more result actions per minute 10 5 0 moderate training Lever Presses extensive training, one outcome extensive training, two outcomes actions per minute 10 5 0 Magazine Behavior moderate training extensive training behavior is not always consistent: leverpressing is habitual and continues for unwanted food...at same time nosepoking is reduced (explanations?) Kilcross & Coutureau (2003) 19 why are nosepokes always sensitive to devaluation? Balleine & Dickinson: 3 rd system - Pavlovian behavior is directly sensitive to outcome value But: doesn t make sense... the Pavlovian system has information that it is withholding from the instrumental system? Also.. true for purely instrumental chain And anyway, it seems that all the information is around all the time, so why is behavior not always goal-directed? 20

outline what goes into instrumental associations? goal directed versus habitual behavior neural dissociations between habitual and goaldirected behavior how does all this fit in with reinforcement learning? 21 back to L framework for decisions nose poke no food press lever food in mag poke nose eating r = 1 poke nose lever press lever press 3 states: no food, food in mag, eating 2 actions: press lever, poke nose immediate reward is 1 in state eating and 0 otherwise need to know long term consequences of actions Q(S,a) in order to choose the best one how can these be learned? 22

strategy 1: model-based L 4 0 1 2 S 1 S 2 L S 0 learn model of task through experience (= cognitive map) compute Q values by looking ahead in the map computationally costly, but also flexible (immediately sensitive to change) Q(S 0,L) = 40 S L 1 S 0 S 2 Q(S 0,) = 2 L L = 04 = 0 = 1 = 2 23 strategy 1: model-based L nose poke no food press lever food in mag poke nose eating r = 1 poke nose lever press lever press 24

strategy I1: model-free L 4 0 1 2 S 1 S 2 L S 0 Shortcut: store long-term values then simply retrieve them to choose action Stored: Q(S 0,L) = 4 Can learn these from experience Q(S 0,) = 2 without building or searching a model incrementally through prediction errors dopamine dependent SASA/Q-learning or Actor/Critic Q(S 1,L) = 4 Q(S 1,) = 0 Q(S 2,L) = 1 Q(S 2,) = 2 25 strategy I1: model-free L 4 0 1 2 S 1 S 2 L S 0 choosing actions is easy so behavior is quick, reflexive (S-) but needs a lot of experience to learn and inflexible, need relearning to adapt to any change (habitual) Stored: Q(S 0,L) = 4 Q(S 0,) = 2 Q(S 1,L) = 4 Q(S 1,) = 0 Q(S 2,L) = 1 Q(S 2,) = 2 26

two big questions Why should the brain use two different strategies/ controllers in parallel? If it uses two: how can it arbitrate between the two when they disagree (new decision making problem ) 27 answers each system is best in different situations (use each one when it is most suitable/most accurate) goal-directed (forward search) - good with limited training, close to the reward (don t have to search ahead too far) habitual (cache) - good after much experience, distance from reward not so important arbitration: trust the system that is more confident in its recommendation different sources of uncertainty in the two systems compare to: always choose the highest value estimated action value model free model based 28

back to animals pressing a lever for a devalued food, but not nose-poking to get it: can you explain this? 29 summary instrumental behavior is not a simple unitary phenomenon: the same behavior can result from different neural and computational origins different neural mechanisms work in parallel to support behavior: cooperation and competition useful tests: outcome devaluation, contingency degradation 30