Computational cognitive neuroscience: 8. Motor Control and Reinforcement Learning

Similar documents
Anatomy of the basal ganglia. Dana Cohen Gonda Brain Research Center, room 410

Basal Ganglia. Introduction. Basal Ganglia at a Glance. Role of the BG

Making Things Happen 2: Motor Disorders

COGNITIVE SCIENCE 107A. Motor Systems: Basal Ganglia. Jaime A. Pineda, Ph.D.

Teach-SHEET Basal Ganglia

Brain anatomy and artificial intelligence. L. Andrew Coward Australian National University, Canberra, ACT 0200, Australia

GBME graduate course. Chapter 43. The Basal Ganglia

Basal Ganglia George R. Leichnetz, Ph.D.

1/2/2019. Basal Ganglia & Cerebellum a quick overview. Outcomes you want to accomplish. MHD-Neuroanatomy Neuroscience Block. Basal ganglia review

The Cerebellum. Outline. Lu Chen, Ph.D. MCB, UC Berkeley. Overview Structure Micro-circuitry of the cerebellum The cerebellum and motor learning

NS219: Basal Ganglia Anatomy

Motor System Hierarchy

Motor systems III: Cerebellum April 16, 2007 Mu-ming Poo

For more information about how to cite these materials visit

Basal ganglia Sujata Sofat, class of 2009

Basal Ganglia. Steven McLoon Department of Neuroscience University of Minnesota

Basal Nuclei (Ganglia)

Medial View of Cerebellum

A. General features of the basal ganglia, one of our 3 major motor control centers:

Connections of basal ganglia

A. General features of the basal ganglia, one of our 3 major motor control centers:

Damage on one side.. (Notes) Just remember: Unilateral damage to basal ganglia causes contralateral symptoms.

Basal Ganglia General Info

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Basal nuclei, cerebellum and movement

VL VA BASAL GANGLIA. FUNCTIONAl COMPONENTS. Function Component Deficits Start/initiation Basal Ganglia Spontan movements

11/2/2011. Basic circuit anatomy (the circuit is the same in all parts of the cerebellum)

Cerebellum: little brain. Cerebellum. gross divisions

The Wonders of the Basal Ganglia

Lecture : Basal ganglia & Cerebellum By : Zaid Al-Ghnaneem

Cerebellum. Steven McLoon Department of Neuroscience University of Minnesota

Cerebellum: little brain. Cerebellum. gross divisions

Dr. Farah Nabil Abbas. MBChB, MSc, PhD

神經解剖學 NEUROANATOMY BASAL NUCLEI 盧家鋒助理教授臺北醫學大學醫學系解剖學暨細胞生物學科臺北醫學大學醫學院轉譯影像研究中心.

Strick Lecture 4 March 29, 2006 Page 1

The Cerebellum. Outline. Overview Structure (external & internal) Micro-circuitry of the cerebellum Cerebellum and motor learning

Basal Ganglia Anatomy, Physiology, and Function. NS201c

Research Article A Biologically Inspired Computational Model of Basal Ganglia in Action Selection

CASE 48. What part of the cerebellum is responsible for planning and initiation of movement?

The Nervous System: Sensory and Motor Tracts of the Spinal Cord

Biological Bases of Behavior. 8: Control of Movement

PETER PAZMANY CATHOLIC UNIVERSITY Consortium members SEMMELWEIS UNIVERSITY, DIALOG CAMPUS PUBLISHER

nucleus accumbens septi hier-259 Nucleus+Accumbens birnlex_727

Announcement. Danny to schedule a time if you are interested.

Modeling the interplay of short-term memory and the basal ganglia in sequence processing

Strick Lecture 3 March 22, 2017 Page 1

Basal Ganglia. Today s lecture is about Basal Ganglia and it covers:

The Cerebellum. The Little Brain. Neuroscience Lecture. PhD Candidate Dr. Laura Georgescu

Chapter 8. Control of movement

The basal forebrain: Questions, chapter 29:

Embryological origin of thalamus

THE BRAIN HABIT BRIDGING THE CONSCIOUS AND UNCONSCIOUS MIND

Neocortex. Cortical Structures in the Brain. Neocortex Facts. Laminar Organization. Bark-like (cortical) structures: Shepherd (2004) Chapter 12

Chapter 2: Studies of Human Learning and Memory. From Mechanisms of Memory, second edition By J. David Sweatt, Ph.D.

BASAL GANGLIA. Dr JAMILA EL MEDANY

Dynamic Behaviour of a Spiking Model of Action Selection in the Basal Ganglia

A3.1.7 Motor Control. 10 November 2016 Institute of Psychiatry,Psychology and Neuroscience Marinela Vavla

Learning Working Memory Tasks by Reward Prediction in the Basal Ganglia

MODULE 6: CEREBELLUM AND BASAL GANGLIA

Levodopa vs. deep brain stimulation: computational models of treatments for Parkinson's disease

THE BRAIN HABIT BRIDGING THE CONSCIOUS AND UNCONSCIOUS MIND. Mary ET Boyle, Ph. D. Department of Cognitive Science UCSD

COGS 107B. Week 7 Section IA: Ryan Szeto OH: Wednesday CSB Kitchen

The Cerebellum. Little Brain. Neuroscience Lecture. Dr. Laura Georgescu

Voluntary Movement. Ch. 14: Supplemental Images

COGNITIVE SCIENCE 107A. Sensory Physiology and the Thalamus. Jaime A. Pineda, Ph.D.

I: To describe the pyramidal and extrapyramidal tracts. II: To discuss the functions of the descending tracts.

Nervous system part 1. Danil Hammoudi.MD

PSY 315 Lecture 11 (2/23/2011) (Motor Control) Dr. Achtman PSY 215. Lecture 11 Topic: Motor System Chapter 8, pages

The Central Nervous System

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain

The neurvous system senses, interprets, and responds to changes in the environment. Two types of cells makes this possible:

Gangli della Base: un network multifunzionale

Timing and the cerebellum (and the VOR) Neurophysiology of systems 2010

Brainstem. Steven McLoon Department of Neuroscience University of Minnesota

Course Calendar

Prof. Saeed Abuel Makarem & Dr.Sanaa Alshaarawy

Basal ganglia macrocircuits

Plasticity of Cerebral Cortex in Development

You submitted this quiz on Sun 19 May :32 PM IST (UTC +0530). You got a score of out of

Chapter 14: Integration of Nervous System Functions I. Sensation.

Systems Neuroscience Dan Kiper. Today: Wolfger von der Behrens

MOVEMENT OUTLINE. The Control of Movement: Muscles! Motor Reflexes Brain Mechanisms of Movement Mirror Neurons Disorders of Movement

Chapter 3. Structure and Function of the Nervous System. Copyright (c) Allyn and Bacon 2004

Neurophysiology of systems

KINE 4500 Neural Control of Movement. Lecture #1:Introduction to the Neural Control of Movement. Neural control of movement

Cortical Control of Movement

Cerebellum John T. Povlishock, Ph.D.

Models of Basal Ganglia and Cerebellum for Sensorimotor Integration and Predictive Control

This is a repository copy of Basal Ganglia. White Rose Research Online URL for this paper:

STRUCTURE AND CIRCUITS OF THE BASAL GANGLIA

BASAL GANGLIA: A "pit stop" that integrates the movement, cognition and emotion.

Motor Functions of Cerebral Cortex

Structure and Function of Neurons

Course Calendar - Neuroscience

Review Article How Basal Ganglia Outputs Generate Behavior

The Central Nervous System I. Chapter 12

A Role for Dopamine in Temporal Decision Making and Reward Maximization in Parkinsonism

THE CEREBELLUM. - anatomy of the cerebellum cerebellar nuclei cerebellar inputs and neuronal structure of the Purkinje cells outputs cerebellum

Lesson 14. The Nervous System. Introduction to Life Processes - SCI 102 1

Degree of freedom problem

Transcription:

1 Computational cognitive neuroscience: 8. Motor Control and Reinforcement Learning Lubica Beňušková Centre for Cognitive Science, FMFI Comenius University in Bratislava

2 Sensory-motor loop The essence of cognition is built upon the so-called sensory-motor loop i.e. processing sensory inputs to determine which motor action to perform next. This is the most basic function of any nervous system from worms to humans, from sensation to action. The human brain has a number of such loops, from the most primitive reflexes in the peripheral nervous system, up to the most abstract plans, such as the decision to apply to and attend University

Sensory-motor loop: the block scheme Sensory events are relayed through the spinal cord / brainstem and through the thalamus to the cerebral cortex, where they are processed. Commands for action are then relayed from the cortex to the basal ganglia and cerebellum, from there back to the brainstem and the spinal cord from where they result in concrete muscle actions (movements). 3

4 Sensory-motor loop: the brain scheme Sensory events are relayed through the spinal cord / brainstem and through the thalamus to the cerebral cortex, where they are processed. Commands for action are then relayed from the cortex to the basal ganglia and cerebellum, from there back to the brainstem and the spinal cord from where they result in concrete muscle actions (movements).

Basal Ganglia and the Cerebellum The most important motor output and control systems at the subcortical level are the cerebellum and basal ganglia, each of which has specially adapted learning mechanisms. Basal ganglia are specialized for learning from reward/punishment signals, in comparison to expectations for reward/punishment, and this learning then shapes the action selection that the organism will make under different circumstances (selecting the most rewarding actions and avoiding punishing ones). This form of learning is called reinforcement learning. The cerebellum is specialized for learning from error, specifically errors between the sensory outcomes associated with motor actions, relative to expectations for these sensory outcomes associated with those motor actions. Thus, the cerebellum can refine the implementation of a given motor plan, to make it more accurate, efficient, and well-coordinated. 5

6 Basal ganglia, cerebellum and the cortex In short, the basal ganglia select one out of many possible actions to perform, and the cerebellum then makes sure the selected action is performed well. Thus, parietal representations (i.e., the where" pathway), drive motor action execution as coordinated by the cerebellum, and cerebellum is also densely interconnected with parietal cortex. In contrast, the basal ganglia are driven to a much greater extent by the ventral pathway "what" information, which indicates the kinds of rewarding objects that might be present in the environment. Interestingly, there are no direct connections between the basal ganglia and cerebellum -- instead, each operates in interaction with various areas in the cortex, where the action plans are formulated and coordinated.

7 Basal ganglia, cerebellum and the cortex Both the cerebellum and basal ganglia have a complex disinhibitory output dynamic, which produces a gating-like effect on the brain areas they control. For example, the basal ganglia can disinhibit neurons in specific nuclei of the thalamus, which have bidirectional excitatory circuits through frontal and prefrontal cortical areas. The net effect of this disinhibition is to enable an action to proceed, without needing to specify any of the details for how to perform that action. This is what is meant by a gate -- something that broadly modulates the flow of other forms of activation. The cerebellum similarly disinhibits parietal and frontal neurons to effect its form of precise control over the shape of motor actions. It also projects directly to motor outputs in the brain stem, something that is not true of most basal ganglia areas.

Striatum, which is comprised of caudate nucleus and putamen Parts of Basal ganglia (BG) Globus pallidus internal segment (GPi) and external segment (GPe) Thalamus and subthalamic nucleus Cerebellum Substantia nigra pars compacta (SNc) 8

9 Striatum The striatum is the major input region of BG. Consists of the caudate and putamen. The striatum is anatomically subdivided into many small clusters of neurons, with two major types of clusters: patch/striosomes and matrix/matrisomes. The matrix clusters contain direct (Go) and indirect (NoGo) pathway medium spiny neurons, which together make up 95% of striatal cells, both of which receive excitatory inputs from all over the cortex but are inhibitory on their downstream targets in the globus pallidus as described next. The patch cells project to the dopaminergic system, and thus appear to play a more indirect role in modulating learning signals. There are also a relatively few widely spaced tonically active neurons (TAN's), which release acetylcholine as a neurotransmitter and appear to play a modulatory role, and inhibitory interneurons, which likely perform the same kind of dynamic gain control that they play in the cortex.

Globus pallidus: internal segment (GPi) Contains neurons that are constantly (tonically) active even with no additional input. These neurons send inhibition to specific nuclei in the thalamus. When the direct/go pathway striatum neurons fire, they inhibit these GPi neurons, and thus disinhibit the thalamus, resulting ultimately in the initiation of a specific motor (or cognitive) action (such as a thought). In another fronto-basal ganglia circuits, the role of the GPi is taken up by the substantia nigra pars reticulata (SNr), which receives input from other areas of striatum and projects to outputs regulating other actions (e.g., eye movements in the superior colliculus). 10

11 Globus pallidus: external segment (GPe) Contains tonically active neurons that send focused inhibitory projections to corresponding GPi neurons. When the indirect/nogo pathway neurons in the striatum fire, they inhibit the GPe neurons, and thus disinhibit the GPi neurons, causing them to provide even greater inhibition onto the thalamus. This blocks the initiation of specific actions coded by the population of active NoGo neurons.

Thalamus MD The thalamus, specifically the medial dorsal (MD), ventral anterior (VA), and ventrolateral (VL) nuclei. When the thalamic neurons get disinhibited by Go pathway firing, they can fire, but only when driven by top-down excitatory input from the frontal cortex. In this way, the basal ganglia serve as a gate on the thalamocortical circuit Go firing opens the gate, while NoGo firing closes it 12

13 The subthalamic nucleus (STN) Acts as the so-called hyperdirect pathway, because it receives input directly from frontal cortex and sends excitatory projections directly to BG output (GPi), bypassing the striatum altogether. A single STN neuron projects broadly to many GPi neurons, and as such the STN is thought to provide a global NoGo function that prevents gating of any motor or cognitive action (technically, it raises the threshold for gating).

Substantia nigra pars compacta (SNc) SNc has neurons that release the neuromodulator dopamine and innervate the striatum. There are two different kinds of dopamine receptors in the striatum. D1 receptors are prevalent in Go pathway neurons, and dopamine has an excitatory effect on neurons with D1 receptors. In contrast, D2 receptors are prevalent in NoGo pathway neurons, and dopamine has an inhibitory effect via the D2 receptors. 14

The basal ganglia cortex loop 15

DA burst activity drives the direct "Go" pathway neurons in the striatum, which then inhibit the tonic activation in the globus pallidus internal segment (GPi), which releases specific nuclei in the thalamus from inhibition, allowing them to complete a bidirectional excitatory circuit with the frontal cortex, resulting in the initiation of a motor action. The increased Go activity during DA bursts results in potentiation of corticostriatal synapses, and hence learning to select actions that tend to result in positive outcomes. 16

DA dip (decrease in tonic DA neuron firing in SNc) leads to preferential activity of indirect "NoGo" pathway neurons in the striatum, which inhibit the external segment globus pallidus neurons (GPe), that inhibit the GPi. Increased NoGo activity thus results in disinhibition of GPi, making it more active and thus inhibiting the thalamus, preventing initiation of the corresponding motor action. The DA dip results in potentiation of corticostriatal NoGo synapses, and hence learning to avoid selection of actions that tend to result in negative outcomes. 17

Dopamine (DA) and Reinforcement Learning The DA neurons in SNc and VTA encode the difference between reward received versus an expectation of reward. Prior to conditioning, when a reward is delivered, the dopamine neurons fire a burst of activity. After the animal has learned to associate a conditioned stimulus (CS) (e.g., a tone) with the reward, the dopamine neurons now fire to the onset of the CS, and not to the reward itself. If a reward is withheld after the CS, there is a dip or pause in DA firing, indicating that there was a prediction of the reward, and when it failed to arrive, there was a negative prediction error. This pattern of firing is consistent with reinforcement learning applied to cortico-striatal synapses based on reward prediction error. 18

Reinforcement Learning Computationally, the simplest model of reward prediction error is the Rescorla- Wagner conditioning model, which is mathematically identical to the delta rule for perceptron learning, and is simply the difference between the actual reward r and the expected reward R: Dw d x d r R d r xw Where Dw is the change of the weight, x is the input, w is the weight. Thus, d ( delta ) is the prediction error and R = S xw is the amount of expected reward, which is computed as a weighted sum over stimuli x over weights w. 19

Reinforcement Learning The weights adapt to try to accurately predict the actual reward values, and in fact this delta value specifies the direction in which the weights should change: Dw d x When the reward prediction is correct, i.e. r = R, then the actual reward value is cancelled out by the prediction, i.e. d = 0 and then Dw = 0. If the actual reward r is smaller than R, then d < 0 and the weights from cortex to striatum decrease in strength. If the actual reward r is bigger that R, then d > 0 and the weights from cortex to striatum increase in strength. 20

Temporal Difference (TD) Reinforcement Learning As the reward may occur later in time, relative to Rescorla-Wagner, TD just adds one additional term to the delta equation, representing the future reward values that might come later in time: d ( r f ) R where f represents the future rewards, and now the reward expectation R has to try to anticipate both the current reward r and this future reward f. In a simple conditioning task, where the CS reliably predicts a subsequent reward, the onset of the CS results in an increase in this f value, because once the CS arrives, there is a high probability of reward in the near future. Furthermore, this f itself is not predictable, because the onset of the CS is not predicted by any earlier cue (and if it was, then that earlier cue would be the real CS, and drive the dopamine burst). Therefore, the R expectation cannot cancel out the f value, and a dopamine burst lasts. 21

Temporal Difference (TD) Reinforcement Learning Although the f value explains CS-onset dopamine firing, it raises the question of how can the system know what kind of rewards are coming in the future? Like anything having to do with the future, you fundamentally just have to use the past as your guide to guess the future. TD does this by specifying something known as a value function, v(t) that is a sum of all present and future rewards, with the future rewards discounted by a g "gamma" factor, which captures the intuitive notion that rewards further in the future are worth less than those that will occur sooner. Thus if 0< g < 1, then v( t) r( t) 1 g r( t 1) g 2 r( t 2) g 3 r( t 3)... 22

Temporal Difference (TD) Reinforcement Learning The recursive relationship reads v( t) r( t) g v( t 1) all of these value terms are really estimates, thus V ( t) r( t) g V ( t 1) Next we subtract the estimate V(t) from both sides, which gives us an expression that is another way of expressing the above equality that the difference between these terms should be equal to zero, i.e. r( t) g V ( t 1) V ( ) 0 t However, in reality this difference will not be zero, and in fact, the extent to which it is not zero is the extent to which there is a reward prediction error: d r( t) g V ( t 1) V ( t) 23

TD Reinforcement Learning Computationally, if the actual reward r and the expected reward R, then Dw d x d ( r f ) d R r( t) g V( t 1) V( t) Thus f g V ( t 1) Thus, our reward expectation is now a "value expectation" instead (replacing the R with V). As with Rescorla-Wagner ruler, the TD delta value here drives learning of the value expectations of the cortico-striatal synapses. This rule has been very successful in modeling brain function. 24

25 The Actor-Critic Idea of Motor Learning The basal ganglia is the actor in this case, and the dopamine signal is the output of the critic, which then serves as a training signal for the actor (and the critic too as we saw earlier). The reward prediction error signal produced by the dopamine system is a good training signal because it drives stronger learning early in a skill acquisition process, when rewards are more unpredictable, and reduces learning as the skill is perfected, and rewards are thus more predictable.

26 The Cerebellum: inputs Once the basal ganglia elect an action to perform based on TD reinforcement learning, the cerebellum takes over once the action has been initiated, and uses error-driven learning to shape the performance of the action so that it is accurate and well-coordinated. The cerebellum only receives inputs from cortical areas directly involved in the motor production, including the parietal cortex and the motor areas of frontal cortex. Unlike the basal ganglia, it does not receive inputs from prefrontal cortex or temporal cortex

27 The Cerebellum: anatomy The cerebellum has a very characteristic anatomy, with the same basic circuit replicated million times over and over. There are several neuron types among which the Purkinje cells with extremely large dendritic trees are the most dominant. These cells were discovered by the Czech scientist Jan Evangelista Purkyně in the 19 th century.

MF = mossy fiber input axons. GC = granule cell. GcC = Golgi cell. PF = parallel fiber. PC = Purkinje cell. BC = basket cell, SC = stellate cell. CF = climbing fiber. DCN = deep cerebellar nuclei. IO = inferior olive. The Cerebellum: circuitry There are cca 15 million PCs and these cells produce output from cerebellum. There are cca 40 billion GCs. Each PC receives as many as 200,000 inputs from GCs but only one CF. 28

29 The Cerebellum: learning It is thought that climbing fiber inputs relay a training or error signal to the Purkinje cells, which then drives synaptic plasticity in its associated granule cell inputs. One prominent idea is that this synaptic plasticity tends to produce LTD (weight decrease) for synaptic inputs where the granule cells are active, which then makes these neurons less likely to fire the Purkinje cell in the future. This would make sense given that the Purkinje cells are inhibitory on the deep cerebellar nuclei neurons, so to produce an output from them, the Purkinje cell needs to be turned off. David Marr and James Albus have become famous for developing this theory of cerebellum.

30 The Cerebellum: learning The goal of this machinery is to associate stimulus inputs with motor output commands, under the command of the climbing fiber inputs. One important principle of cerebellar function is the projection of inputs into a very high-dimensional space over the granule cells computationally this achieves the separation form of learning, where each combination of inputs activates a unique pattern of granule cell neurons. This unique pattern can then be associated with a different output signal from the cerebellum, producing something approximating a lookup table of input/output values (for each input pattern x there is only one output f(x)). A lookup table provides a very robust solution to learning very complex, arbitrary functions it will always be able to encode any kind of function. The drawback is that it does not generalize to novel input patterns very well.