Learning of sequential movements by neural network model with dopamine-like reinforcement signal

Size: px
Start display at page:

Download "Learning of sequential movements by neural network model with dopamine-like reinforcement signal"

Transcription

1 Exp Brain Res (1998) 121:350±354 Springer-Verlag 1998 RESEARCH NOTE Roland E. Suri Wolfram Schultz Learning of sequential movements by neural network model with dopamine-like reinforcement signal Received: 24 November 1997 / Accepted: 30 April 1998 Abstract Dopamine neurons appear to code an error in the prediction of reward. They are activated by unpredicted rewards, are not influenced by predicted rewards, and are depressed when a predicted reward is omitted. After conditioning, they respond to reward-predicting stimuli in a similar manner. With these characteristics, the dopamine response strongly resembles the predictive reinforcement teaching signal of neural network models implementing the temporal difference learning algorithm. This study explored a neural network model that used a reward-prediction error signal strongly resembling dopamine responses for learning movement sequences. A different stimulus was presented in each step of the sequence and required a different movement reaction, and reward occurred at the end of the correctly performed sequence. The dopamine-like predictive reinforcement signal efficiently allowed the model to learn long sequences. By contrast, learning with an unconditional reinforcement signal required synaptic eligibility traces of longer and biologically less-plausible durations for obtaining satisfactory performance. Thus, dopamine-like neuronal signals constitute excellent teaching signals for learning sequential behavior. Key words Basal ganglia Teaching signal Temporal difference Synaptic plasticity Eligibility Matlab programs of this model are available at ftp://ftp.usc.edu/pub/bsl/suri/suri_schultz R.E. Suri W. Schultz ()) Institute of Physiology, University of Fribourg, CH-1700 Fribourg, Switzerland Wolfram.Schultz@unifr.ch, Tel.: , Fax: R.E. Suri USC Brain Project, University of Southern California, Hedco Neurosciences Building, 3614 Watt Way, Los Angeles, CA , USA Introduction A large body of evidence suggests that learning is crucially dependent on the degree of unpredictability of reinforcers (Rescorla and Wagner 1972; Dickinson 1980). Only reinforcers occurring at least to some degree unpredictably will sustain learning, and learning curves reach asymptotes when all reinforcers are fully predicted. The discrepancy between the occurrence of reinforcement and its prediction is referred to as an error in the prediction of reinforcement. Error-driven learning is employed in a large variety of neural models and has been particularly elaborated in the temporal-difference (TD), algorithm, which computes the prediction error continuously in real time and establishes reinforcer predictions (Sutton and Barto 1990). The TD algorithm can be implemented in an explicit critic-actor architecture (Barto et al. 1983; Sutton and Barto 1998). The prediction error is computed and emitted as a teaching signal by the critic in order to adapt synaptic weights in the actor, which directs behavioral output (Montague et al. 1993; Friston et al. 1994). This architecture resembles the anatomical structure of the basal ganglia: the critic taking the place of the nigrostriatal dopamine neurons, and the actor corresponding to the striatum (Barto 1995; Houk et al. 1995). The prediction-error signal of the critic is strikingly similar to the activities of midbrain dopamine neurons (Montague et al. 1996; Schultz et al. 1997). Both signals are increased by unpredicted rewards, are not influenced by full predicted rewards, and are decreased by omission of predicted rewards. They are transferred to the earliest reward-predicting events through experience and, thus, predict rewards before they occur rather than reporting them only after the behavior. Neurophysiological and inactivation studies suggest an important involvement of the basal ganglia in movement sequences (Kermadi and Joseph 1995; Mushiake and Strick 1995; Miyachi et al. 1997). Disorders of dopamine neurotransmission in the striatum impair serially ordered movements in human patients (Benecke et al. 1987; Phillips et al. 1993). As reinforcement-learning tasks may

2 constitute sequential Markov decision processes (Sutton and Barto 1998), we aimed of investigating whether TD prediction-error signals with similar characteristics as dopamine responses could be used for learning sequential movements. 351 Methods and algorithms The task consisted of a defined sequence of seven specific stimulusaction pairs. Presentation of one stimulus (A, B, C, D, E, F, or G) elicited one of seven actions (Q, R, S, T, U, V, or W). The stimulus-action pairs followed each other at intervals of 300 ms. Reward was delivered at the end of the sequence when all individual actions had been correctly chosen (A QÞB RÞC SÞD TÞE UÞF VÞG WÞ reward). The sequence was learned backwards by associating each stimulus with a particular action by trial and error, in a total of seven blocks of 100 trials. In the first block, stimulus G required action W in order to lead to reward (G WÞreward). Sequence length increased by one stimulus-action pair in each training block of 100 trials. Correct action in each step resulted in the appearance of the previously learned stimulus of the subsequent step, which predicted the terminal reward and thus constituted a conditioned reinforcer (e.g. F VÞG WÞ reward). Any incorrect action terminated the trial. Learning of the sequence was simulated 10 times, and average learning curves were computed. Modelling the dopamine prediction error signal In the critic component of the model (Fig. 1, right), a stimulus l was represented as a series of signals, x lm (t) of varying durations (Fig. 2A, lines 3±5) in order to reproduce timing mechanisms involved in the depression of dopamine activity at the time of omitted reward. A number of such signals sufficient for covering the duration of the interstimulus intervals was chosen (m=1, 2, 3 for 300-msec interstimulus interval; m=1, 2. ¼, 10 for 1-s interstimulus interval). Reward prediction P l (t) was computed as the weighted sum over the stimulus representations x lm (t) P l ˆX t w lm x lm t m Adaptive weights, w lm, were initialized with value zero. The reward prediction P(t) was the sum over the reward predictions computed from the stimuli l: Pt ˆX P l t l The desired prediction signal increased successively from one time step to the next by a factor 1/g until the reward l(t) occurred, and decreased to baseline value zero after its presentation (Fig. 2A, line 6). Therefore, the prediction error signal was rt ˆl gp t Pt t 1 One time step corresponded to 100 msec. The discount factor, g, reflecting the decreased impact of more distant reinforcement, was estimated for dopamine neurons as g=0.98 (see Fig. 2B). The learning rule for the weights w lm was w lm ˆw t lm t 1 h c rt x lm t 1 with learning rate h c =0.1. Modelling sequential movements Fig. 1 Model architecture consisting of an actor (left) and a critic (right). The prediction-error signal (rt), serves to modify the synaptic weights, w lm, of the critic itself and the synaptic weights, v nl,of the actor (heavy dots). Critic. Every stimulus l is represented over time as a series of signals, x lm (t), of different durations. Each signal, x lm (t), is multiplied with the corresponding adaptive weight, w lm,in order to compute the prediction, P l (t). The temporal differences in these predictions are computed, summed over all stimuli, and added to the reward signal, l(t), in order to compute the prediction-error signal, r(t). Actor. The actor learns stimulus-action pairs under the influence of the prediction-error signal of the critic. Every actor neuron n (large circles), represents a specific action. A winner-take-all rule prevents the actor from performing two actions at the same time and can be implemented as lateral inhibition between neurons (Grossberg and Levine 1987) The actor component of the model (Fig. 1, left) used the reward-prediction error as teaching signal for adapting the weights v nl (initialized with zero values, n=1.., l=1.. 7). Activations a n (t) were computed from stimuli e l (t) with a n ˆX t v nl e l s t n t l s n (t) was a small random perturbation (normally distributed with mean 0.0 and variance 0.1). Actions were only allowed in response to stimuli. In addition, the model performed only one action at a time, which was implemented with a winner-take-all-rule between actor neurons. The neuron n with the largest activation elicited action n. Synapses activated when stimulus l elicited action n were considered to be eligible for modification through conjoint pre-postsynaptic activation. In order to extend the eligibility for synaptic modification beyond the immediate activation, we used an eligibility trace e nl ; t which was initially set to one and decreased during subsequent time steps of 100 msec with a factor d e nl ˆ t 1 d e nl t 1 The factor d determined the rate of trace decay and was set to 0.4 (d=0.4), resulting in a relatively short eligibility trace (8% remaining after 500 ms, <1% after 1000 ms). For a particular simulation, we used a smaller factor (d=0.2), resulting in a much longer eligibility trace (33% remaining after 500 ms, 11% after 1000 ms). Weights v nl were adapted with learning rate h a =1 according to v nl ˆv t nl t 1 h a rt e nl t Results The prediction error signal of the critic component was phasically increased by unpredicted reward (Fig. 2B). The increase was successively transferred to the earliest

3 352 Fig. 2A±D Internal states and signals of the model. A Signals in the critic during the learning of stimulus-reward associations. Binary signals code for the stimulus (line 1) and the reward (line 2). The stimulus is internally represented as a series of sustained signals with different durations (lines 3±5). The prediction signal (line 6) and the prediction-error signal (bottom) are shown before (solid lines) and after learning (dashed lines). (All signals with baseline zero). B Prediction-error signal of the critic (left) compared with activity of dopamine neurons (right). Occurrence of unpredicted reward results in increased prediction-error signal and dopamine activity after the reward (line 1). After repeated pairings between stimulus G and reward, both prediction-error signal and dopamine activity are already increased after stimulus G, but are at baseline at the time of reward (line 2). After training with an additional stimulus (F) preceding stimulus G, both prediction-error signal and dopamine activity are increased after stimulus F and unaffected by stimulus G and reward (line 3). The response magnitude of dopamine neurons decreases for increased stimulus-reward intervals (lines 1±3). A decrease of 2% per 100-ms increased stimulus-reward intervals is estimated (g=0.98). When the reward predicted by conditioned stimulus G is omitted, both prediction-error signal and dopamine activity are decreased below baseline at the predicted time of reward (line 4) (data from Schultz et al. 1993; Mirenowicz and Schultz 1994). C Learning actions with prediction-error signals. Stimulus F elicits the correct action V, which leaves an eligibility trace (line 1). The prediction-error signal (line 2) is increased by predictive stimulus G. The weight associating stimulus F with action V (line 3) is adapted according to the preduct (eligibility trace prediction-error signal). D Reduced weight changes induced by an unconditional reinforcement signal unrelated to reward prediction. In the same situation as in C, modification of the weight F V using an unconditional reinforcement signal (line 1) results in a later and smaller increase (bottom) Fig. 3A±C Learning curves for a sequence of seven stimulus-action pairs. An additional stimulus-action pair is added to the sequence after every block of 100 trials. A Use of a prediction-error signal results in stable learning with minimal numbers of incorrect trials at all sequence lengths tested. Thus, the pair G W is learned during the first block of 100 trials (left), the sequence F FÞG W is learned during the second block, the sequence E UÞF VÞG W is learned during the third block, etc. until the whole sequence of seven steps is learned during the last block (right). B When trained with an unconditional reinforcement signal, only a sequence length of three stimulus-action associations is learned. C Prolongation of the synaptic-eligiblity trace ameliorates sequence learning with an unconditionial reinforcement signal. However, at the present state of knowledge, longer eligibility traces are biologically less plausible. Percentages are means of 10 repetitions reward-predicting stimulus at each length of sequence. The signal was decreased when predicted reward was omitted. The reinforcement signal of the critic closely resembled dopamine response in all these characteristics. Before learning, the actor component of the model randomly reacted upon presentation of stimulus G with any action from Q±W, including the correct action W. The correct pairing G±W was followed by reward (G WÞreward) and was learned in <100 trials in the first training block (Fig. 3A, left block). Then stimulus F was presented at the beginning of the second training block of 100 trials, and correct action V resulted in appearance of stimulus G in the same trial. Thus, the model learned the sequence F VÞG WÞreward (Fig. 3A, 2nd block from left). The full sequence of seven steps was learned in subsequent learning blocks without decrement (A QÞ ¼G WÞreward). In order to further assess the efficacy of a dopaminelike prediction error signal for learning, we used an unconditional, non-adaptive reinforcement signal. This signal increased after every predicted or unpredicted reward, did not code a reward-prediction error, and did not increase with reward-predicting stimuli. This is analogous to reward responses in fully established tasks frequently found in neurophysiological experiments (Niki and Watanabe 1979; Nishijo et al. 1988; Watanabe 1989; Hikosaka et al. 1989; Apicella et al. 1991). Increases of synaptic weights in the actor following correct stimulus-action pairs were considerably lower than with a prediction-error signal (compare weights F V in Fig. 2D vs C). This resulted in increasingly impaired learning with longer sequences (Fig. 3B). Already, the second learning block was impaired (F V), the third block showed further impairments (E U), and sequences of >3 were not learned within blocks of 100 trials. In order to maintain some degree of learning, the synaptic eligibility trace was prolonged by reducing its decay rate to 20% per 100 ms (d=0.2), instead of the usual 40%. Although this permitted learning of intermediate sequences, it still resulted in impairments with longer sequences (Fig. 3C).

4 353 Discussion This study shows that a reinforcement signal with the essential characteristics of dopamine responses was very efficient for learning movement sequences of considerable length. The coding of prediction error restricted signal increases to unpredictable reinforcement during the learning phase. This prevented run-away synaptic strength (Montague and Sejnowski 1994) without requiring additional algorithms, which were used in a different basal-ganglia model on sequential movements (Dominey et al. 1995). In addition, behavioral errors induced a decreased signal and, thus, reduced the strength of synapses involved in erroneous reactions. Whereas the present model concerned sequences of individual stimulus-action pairs, comparable results were obtained with TD models learning ocular foveation through serial, small-step eye movements (Montague et al. 1993; Friston et al. 1994). The transfer of the signal back to the earliest rewardpredicting stimulus helped to bridge the long gap between the stimulus-action pairs and the terminal reward. The predictive nature of the reinforcement signal allowed the model to strengthen synaptic weights with relatively short eligibility traces in the actor. The traces only covered the interval of 300 msec between stimulus-action pairs and subsequent predictive reinforcement, but decayed entirely during a trial of 2.4 s. Such short traces are biologically much more plausible than longer ones. Their physiological substrates may consist in sustained neuronal activity frequently found in the striatum (Schultz et al. 1995) or, possibly, prolonged changes in calcium concentration (Wickens and Kötter 1995) or formation of calmodulin-dependent protein kinase II (Houk et al. 1995). In addition, the model assumed dopamine-dependent long-term changes in synaptic transmission, depending on presynaptic activity, postsynaptic activity, and reinforcement signal. This form of plasticity was reported in striatial slice preparations (Calabresi et al. 1992, 1997; Wickens et al. 1996) and could provide a biological basis for such a three factor learning rule. The unconditional reinforcement signal presented severe disadvantages for learning long sequences. Occurring at the time of terminal reward, this signal was unable to strengthen synapses that were used earlier during the sequence. A similar result was obtained with ocular foveation movements (Friston et al. 1994). In the present model, this deficit was to some extent compensated by increasing the duration of synaptic eligibility traces. However, longer eligibility traces become increasingly hypothetical and may not be good bases for biologically plausible neural models. Acknowledgements This study was supported by the James S. McDonnell Foundation (grant 94-39). References Apicella P, Ljungberg T, Scarnati E, Schultz W (1991) Responses to reward in monkey dorsal and ventral striatum. Exp Brain Res 85:491±500 Barto AG (1995) Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, Mass., pp 215±232 Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Systems Man Cyber SMC 13:834±846 Benecke R, Rothwell JC, Dick JPR, Day BL, Marsden CD (1987) Disturbance of sequential movements in Parkinson's disease. Brain 110:361±379 Calabresi P, Pisani A, Mercuri NB, Bernardi G (1992) Long-term potentiation in the striatum is unmasked by removing the voltage-dependent magnesium block of NMDA receptor channels. Eur J Neurosci 4:929±935 Calabresi P, Saiardi A, Pisani A, Baik JH, Centonze D, Mercuri NB, Bernardi G, Borrelli E (1997) Abnormal synaptic plasticity in the striatum of mice lacking dopamine D2 receptors. J Neurosci 17:4536±4544 Dickinson A (1980) Contemporary animal learning theory. Cambridge University Press, Cambridge, Mass. Dominey P, Arbib M, Joseph JP (1995) A model of corticostriatal plasticity for learning oculomotor associations and sequences. J Cogn Neurosci 7:311±336 Friston KJ, Tononi G, Reeke GN Jr, Sporns O, Edelman GM (1994) Value-dependant selection in the brain: simulation in a synthetic neural model. Neuroscience 59:229±243 Grossberg S, Levine DS (1987) Neural dynamics of attentionally modulated Pavlovian conditioning: conditioned reinforcement, inhibition and opponent processing. Psychobiology 15:195±240 Hikosaka O, Sakamoto M, Usui S (1989) Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. J Neurophysiol 61:814±832 Houk JC, Adams JL, Barto AG (1995) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, Mass., pp 249±270 Kermadi I, Joseph JP (1995) Activity in the caudate nucleus during spatial sequencing. J Neurophysiol 74:911±933 Mirenowicz J, Schultz W (1994) Importance of unpredictability for reward responses in primate dopamine neurons. J Neurophysiol 72:1024±1027 Miyachi S, Hikosaka O, Miyashita K, Karadi Z, Rand MK (1997) Differential roles of monkey striatum in learning of sequential hand movement. Exp Brain Res 115:1±5 Montague PR, Sejnowski TJ (1994) The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms. Learn Mem 1:1±33 Montague PR, Dayan P, Nowlan SJ, Pouget A, Sejnowski TJ (1993) Using aperiodic reinforcement for directed self-organization during development. In: Hanson SJ, Cowan JD, Giles CL (eds) Neural information processing systems 5. Morgan Kaufmann, San Mateo, pp 969±976 Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive hebbian learning. J Neurosci 16:1936±1947 Mushiake H, Strick PL (1995) Pallidal neuron activity during sequential arm movements. J Neurophysiol 74:2754±2758 Niki H, Watanabe M (1979) Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res 171:213±224 Nishijo H, Ono T, Nishino H (1988) Single neuron responses in amygdala of alert monkey during complex sensory stimulation with affective significance. J Neurosci 8:3570±3583 Phillips JG, Bradshaw JL, Iansek R, Chiu E (1993) Motor functions of the basal ganglia. Psychol Res 55:175±181

5 354 Rescola RA, Wagner AR (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In: Black AH, Prokasy WF (eds) Classical conditioning II. current research and theory. Appleton-Century- Crofts, New York, pp 64±99 Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response taks. J Neurosci 13:900±913 Schultz W, Apicella P, Romo R, Scarnati E (1995) Context-dependent activity in primate striatum reflecting past and future behavioral events. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, Mass., pp 11±28 Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593±1599 Sutton RS, Barto AG (1990) Time derivative models of Pavlovian reinforcement. In: Gabriel M, Moore J (eds) Learning and compuational neuroscience: foundations of adaptive networks. MIT Press, Cambridge, Mass., pp 539±602 Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press/Bradford Books, Cambridge, Mass. Wickens J, Kötter R (1995) Cellular models of reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, Mass., pp 187±214 Wickens JR, Begg AJ, Arbuthnott GW (1996) Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro. Neuroscience 70:1±5 Watanabe M (1989) The appropriateness of behavioral responses coded in post-trial activity of primate prefrontal units. Neurosci Lett 101:113±117

Dopamine neurons report an error in the temporal prediction of reward during learning

Dopamine neurons report an error in the temporal prediction of reward during learning articles Dopamine neurons report an error in the temporal prediction of reward during learning Jeffrey R. Hollerman 1,2 and Wolfram Schultz 1 1 Institute of Physiology, University of Fribourg, CH-1700

More information

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain Reinforcement learning and the brain: the problems we face all day Reinforcement Learning in the brain Reading: Y Niv, Reinforcement learning in the brain, 2009. Decision making at all levels Reinforcement

More information

A Model of Dopamine and Uncertainty Using Temporal Difference

A Model of Dopamine and Uncertainty Using Temporal Difference A Model of Dopamine and Uncertainty Using Temporal Difference Angela J. Thurnham* (a.j.thurnham@herts.ac.uk), D. John Done** (d.j.done@herts.ac.uk), Neil Davey* (n.davey@herts.ac.uk), ay J. Frank* (r.j.frank@herts.ac.uk)

More information

Learning Working Memory Tasks by Reward Prediction in the Basal Ganglia

Learning Working Memory Tasks by Reward Prediction in the Basal Ganglia Learning Working Memory Tasks by Reward Prediction in the Basal Ganglia Bryan Loughry Department of Computer Science University of Colorado Boulder 345 UCB Boulder, CO, 80309 loughry@colorado.edu Michael

More information

Extending the Computational Abilities of the Procedural Learning Mechanism in ACT-R

Extending the Computational Abilities of the Procedural Learning Mechanism in ACT-R Extending the Computational Abilities of the Procedural Learning Mechanism in ACT-R Wai-Tat Fu (wfu@cmu.edu) John R. Anderson (ja+@cmu.edu) Department of Psychology, Carnegie Mellon University Pittsburgh,

More information

DISCUSSION : Biological Memory Models

DISCUSSION : Biological Memory Models "1993 Elsevier Science Publishers B.V. All rights reserved. Memory concepts - 1993. Basic and clinical aspecrs P. Andcrscn. 0. Hvalby. 0. Paulscn and B. HBkfelr. cds. DISCUSSION : Biological Memory Models

More information

Behavioral considerations suggest an average reward TD model of the dopamine system

Behavioral considerations suggest an average reward TD model of the dopamine system Neurocomputing 32}33 (2000) 679}684 Behavioral considerations suggest an average reward TD model of the dopamine system Nathaniel D. Daw*, David S. Touretzky Computer Science Department & Center for the

More information

Reward Hierarchical Temporal Memory

Reward Hierarchical Temporal Memory WCCI 2012 IEEE World Congress on Computational Intelligence June, 10-15, 2012 - Brisbane, Australia IJCNN Reward Hierarchical Temporal Memory Model for Memorizing and Computing Reward Prediction Error

More information

Shadowing and Blocking as Learning Interference Models

Shadowing and Blocking as Learning Interference Models Shadowing and Blocking as Learning Interference Models Espoir Kyubwa Dilip Sunder Raj Department of Bioengineering Department of Neuroscience University of California San Diego University of California

More information

Combining Configural and TD Learning on a Robot

Combining Configural and TD Learning on a Robot Proceedings of the Second International Conference on Development and Learning, Cambridge, MA, June 2 5, 22. Combining Configural and TD Learning on a Robot David S. Touretzky, Nathaniel D. Daw, and Ethan

More information

Different inhibitory effects by dopaminergic modulation and global suppression of activity

Different inhibitory effects by dopaminergic modulation and global suppression of activity Different inhibitory effects by dopaminergic modulation and global suppression of activity Takuji Hayashi Department of Applied Physics Tokyo University of Science Osamu Araki Department of Applied Physics

More information

Foraging in an Uncertain Environment Using Predictive Hebbian Learning

Foraging in an Uncertain Environment Using Predictive Hebbian Learning Foraging in an Uncertain Environment Using Predictive Hebbian Learning P. Read Montague: Peter Dayan, and Terrence J. Sejnowski Computational Neurobiology Lab, The Salk Institute, 100 ION. Torrey Pines

More information

Expectation of reward modulates cognitive signals in the basal ganglia

Expectation of reward modulates cognitive signals in the basal ganglia articles Expectation of reward modulates cognitive signals in the basal ganglia Reiko Kawagoe, Yoriko Takikawa and Okihide Hikosaka Department of Physiology, School of Medicine, Juntendo University, 2-1-1

More information

Timing and partial observability in the dopamine system

Timing and partial observability in the dopamine system In Advances in Neural Information Processing Systems 5. MIT Press, Cambridge, MA, 23. (In Press) Timing and partial observability in the dopamine system Nathaniel D. Daw,3, Aaron C. Courville 2,3, and

More information

COMPUTATIONAL MODELS OF CLASSICAL CONDITIONING: A COMPARATIVE STUDY

COMPUTATIONAL MODELS OF CLASSICAL CONDITIONING: A COMPARATIVE STUDY COMPUTATIONAL MODELS OF CLASSICAL CONDITIONING: A COMPARATIVE STUDY Christian Balkenius Jan Morén christian.balkenius@fil.lu.se jan.moren@fil.lu.se Lund University Cognitive Science Kungshuset, Lundagård

More information

Emotion Explained. Edmund T. Rolls

Emotion Explained. Edmund T. Rolls Emotion Explained Edmund T. Rolls Professor of Experimental Psychology, University of Oxford and Fellow and Tutor in Psychology, Corpus Christi College, Oxford OXPORD UNIVERSITY PRESS Contents 1 Introduction:

More information

NSCI 324 Systems Neuroscience

NSCI 324 Systems Neuroscience NSCI 324 Systems Neuroscience Dopamine and Learning Michael Dorris Associate Professor of Physiology & Neuroscience Studies dorrism@biomed.queensu.ca http://brain.phgy.queensu.ca/dorrislab/ NSCI 324 Systems

More information

Memory, Attention, and Decision-Making

Memory, Attention, and Decision-Making Memory, Attention, and Decision-Making A Unifying Computational Neuroscience Approach Edmund T. Rolls University of Oxford Department of Experimental Psychology Oxford England OXFORD UNIVERSITY PRESS Contents

More information

Toward a Mechanistic Understanding of Human Decision Making Contributions of Functional Neuroimaging

Toward a Mechanistic Understanding of Human Decision Making Contributions of Functional Neuroimaging CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE Toward a Mechanistic Understanding of Human Decision Making Contributions of Functional Neuroimaging John P. O Doherty and Peter Bossaerts Computation and Neural

More information

MODELING FUNCTIONS OF STRIATAL DOPAMINE MODULATION IN LEARNING AND PLANNING

MODELING FUNCTIONS OF STRIATAL DOPAMINE MODULATION IN LEARNING AND PLANNING MODELING FUNCTIONS OF STRIATAL DOPAMINE MODULATION IN LEARNING AND PLANNING Suri R. E. 1,*, Bargas J. 2, and Arbib M.A. 1 1 USC Brain Project, Los Angeles CA 989-252 2 Departamento de Biofisica, Instituto

More information

Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network

Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network The Journal of Neuroscience, June 29, 2005 25(26):6235 6242 6235 Behavioral/Systems/Cognitive Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in

More information

Hebbian Plasticity for Improving Perceptual Decisions

Hebbian Plasticity for Improving Perceptual Decisions Hebbian Plasticity for Improving Perceptual Decisions Tsung-Ren Huang Department of Psychology, National Taiwan University trhuang@ntu.edu.tw Abstract Shibata et al. reported that humans could learn to

More information

Volume 7, Number 4, 2001 REVIEW

Volume 7, Number 4, 2001 REVIEW ling by Dopamine Neurons Volume 7, Number 4, 2001 REVIEW Reward Signaling by Dopamine Neurons WOLFRAM SCHULTZ Institute of Physiology and Program in Neuroscience University of Fribourg Switzerland Dopamine

More information

Model-Based Reinforcement Learning by Pyramidal Neurons: Robustness of the Learning Rule

Model-Based Reinforcement Learning by Pyramidal Neurons: Robustness of the Learning Rule 4th Joint Symposium on Neural Computation Proceedings 83 1997 Model-Based Reinforcement Learning by Pyramidal Neurons: Robustness of the Learning Rule Michael Eisele & Terry Sejnowski Howard Hughes Medical

More information

How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues

How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues Senior Editor: Dr. Stephen G. Lisberger Section Editor: Dr. John H. R. Maunsell How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding

More information

The Rescorla Wagner Learning Model (and one of its descendants) Computational Models of Neural Systems Lecture 5.1

The Rescorla Wagner Learning Model (and one of its descendants) Computational Models of Neural Systems Lecture 5.1 The Rescorla Wagner Learning Model (and one of its descendants) Lecture 5.1 David S. Touretzky Based on notes by Lisa M. Saksida November, 2015 Outline Classical and instrumental conditioning The Rescorla

More information

A behavioral investigation of the algorithms underlying reinforcement learning in humans

A behavioral investigation of the algorithms underlying reinforcement learning in humans A behavioral investigation of the algorithms underlying reinforcement learning in humans Ana Catarina dos Santos Farinha Under supervision of Tiago Vaz Maia Instituto Superior Técnico Instituto de Medicina

More information

Computational significance of the cellular mechanisms for synaptic plasticity in Purkinje cells

Computational significance of the cellular mechanisms for synaptic plasticity in Purkinje cells Behavioral and Brain Sciences 19(3): 457-461, 1996 The Bold Citations are target articles in this issue Computational significance of the cellular mechanisms for synaptic plasticity in Purkinje cells James

More information

Modeling the interplay of short-term memory and the basal ganglia in sequence processing

Modeling the interplay of short-term memory and the basal ganglia in sequence processing Neurocomputing 26}27 (1999) 687}692 Modeling the interplay of short-term memory and the basal ganglia in sequence processing Tomoki Fukai* Department of Electronics, Tokai University, Hiratsuka, Kanagawa

More information

Differential roles of monkey striatum in learning of sequential hand movement

Differential roles of monkey striatum in learning of sequential hand movement Exp Brain Res (1997) 115:1 5 Springer-Verlag 1997 RESEARCH ARTICLE selor&:shigehiro Miyachi Okihide Hikosaka Kae Miyashita Zoltán Kárádi Miya Kato Rand Differential roles of monkey striatum in learning

More information

Behavioral Neuroscience: Fear thou not. Rony Paz

Behavioral Neuroscience: Fear thou not. Rony Paz Behavioral Neuroscience: Fear thou not Rony Paz Rony.paz@weizmann.ac.il Thoughts What is a reward? Learning is best motivated by threats to survival? Threats are much better reinforcers? Fear is a prime

More information

A Simulation of Sutton and Barto s Temporal Difference Conditioning Model

A Simulation of Sutton and Barto s Temporal Difference Conditioning Model A Simulation of Sutton and Barto s Temporal Difference Conditioning Model Nick Schmansk Department of Cognitive and Neural Sstems Boston Universit Ma, Abstract A simulation of the Sutton and Barto [] model

More information

PVLV: The Primary Value and Learned Value Pavlovian Learning Algorithm

PVLV: The Primary Value and Learned Value Pavlovian Learning Algorithm Behavioral Neuroscience Copyright 2007 by the American Psychological Association 2007, Vol. 121, No. 1, 31 49 0735-7044/07/$12.00 DOI: 10.1037/0735-7044.121.1.31 PVLV: The Primary Value and Learned Value

More information

A Computational Model of Complex Skill Learning in Varied-Priority Training

A Computational Model of Complex Skill Learning in Varied-Priority Training A Computational Model of Complex Skill Learning in Varied-Priority Training 1 Wai-Tat Fu (wfu@illinois.edu), 1 Panying Rong, 1 Hyunkyu Lee, 1 Arthur F. Kramer, & 2 Ann M. Graybiel 1 Beckman Institute of

More information

9.01 Introduction to Neuroscience Fall 2007

9.01 Introduction to Neuroscience Fall 2007 MIT OpenCourseWare http://ocw.mit.edu 9.01 Introduction to Neuroscience Fall 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Working memory short-term

More information

Computational Versus Associative Models of Simple Conditioning i

Computational Versus Associative Models of Simple Conditioning i Gallistel & Gibbon Page 1 In press Current Directions in Psychological Science Computational Versus Associative Models of Simple Conditioning i C. R. Gallistel University of California, Los Angeles John

More information

Dopamine: generalization and bonuses

Dopamine: generalization and bonuses Neural Networks 15 (2002) 549 559 2002 Special issue Dopamine: generalization and bonuses www.elsevier.com/locate/neunet Sham Kakade*, Peter Dayan Gatsby Computational Neuroscience Unit, University College

More information

Systems Neuroscience November 29, Memory

Systems Neuroscience November 29, Memory Systems Neuroscience November 29, 2016 Memory Gabriela Michel http: www.ini.unizh.ch/~kiper/system_neurosci.html Forms of memory Different types of learning & memory rely on different brain structures

More information

The role of efference copy in striatal learning

The role of efference copy in striatal learning The role of efference copy in striatal learning The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Fee,

More information

A causal link between prediction errors, dopamine neurons and learning

A causal link between prediction errors, dopamine neurons and learning A causal link between prediction errors, dopamine neurons and learning Elizabeth E Steinberg 1,2,11, Ronald Keiflin 1,11, Josiah R Boivin 1,2, Ilana B Witten 3,4, Karl Deisseroth 8 & Patricia H Janak 1,2,9,1

More information

Reinforcement learning, conditioning, and the brain: Successes and challenges

Reinforcement learning, conditioning, and the brain: Successes and challenges Cognitive, Affective, & Behavioral Neuroscience 2009, 9 (4), 343-364 doi:10.3758/cabn.9.4.343 Reinforcement learning, conditioning, and the brain: Successes and challenges Tiago V. Maia Columbia University,

More information

Dopamine neurons activity in a multi-choice task: reward prediction error or value function?

Dopamine neurons activity in a multi-choice task: reward prediction error or value function? Dopamine neurons activity in a multi-choice task: reward prediction error or value function? Jean Bellot 1,, Olivier Sigaud 1,, Matthew R Roesch 3,, Geoffrey Schoenbaum 5,, Benoît Girard 1,, Mehdi Khamassi

More information

Active Control of Spike-Timing Dependent Synaptic Plasticity in an Electrosensory System

Active Control of Spike-Timing Dependent Synaptic Plasticity in an Electrosensory System Active Control of Spike-Timing Dependent Synaptic Plasticity in an Electrosensory System Patrick D. Roberts and Curtis C. Bell Neurological Sciences Institute, OHSU 505 N.W. 185 th Avenue, Beaverton, OR

More information

Dynamic Stochastic Synapses as Computational Units

Dynamic Stochastic Synapses as Computational Units Dynamic Stochastic Synapses as Computational Units Wolfgang Maass Institute for Theoretical Computer Science Technische Universitat Graz A-B01O Graz Austria. email: maass@igi.tu-graz.ac.at Anthony M. Zador

More information

A model to explain the emergence of reward expectancy neurons using reinforcement learning and neural network $

A model to explain the emergence of reward expectancy neurons using reinforcement learning and neural network $ Neurocomputing 69 (26) 1327 1331 www.elsevier.com/locate/neucom A model to explain the emergence of reward expectancy neurons using reinforcement learning and neural network $ Shinya Ishii a,1, Munetaka

More information

Evaluating the Effect of Spiking Network Parameters on Polychronization

Evaluating the Effect of Spiking Network Parameters on Polychronization Evaluating the Effect of Spiking Network Parameters on Polychronization Panagiotis Ioannou, Matthew Casey and André Grüning Department of Computing, University of Surrey, Guildford, Surrey, GU2 7XH, UK

More information

CASE 49. What type of memory is available for conscious retrieval? Which part of the brain stores semantic (factual) memories?

CASE 49. What type of memory is available for conscious retrieval? Which part of the brain stores semantic (factual) memories? CASE 49 A 43-year-old woman is brought to her primary care physician by her family because of concerns about her forgetfulness. The patient has a history of Down syndrome but no other medical problems.

More information

Classical and instrumental conditioning: From laboratory phenomena to integrated mechanisms for adaptation

Classical and instrumental conditioning: From laboratory phenomena to integrated mechanisms for adaptation Classical instrumental conditioning: From laboratory phenomena to integrated mechanisms for adaptation Gianluca Baldassarre Department of Computer Science, University of Essex, CO4 3SQ Colchester, United

More information

Systems Neurobiology: Plasticity in the Auditory System. Jason Middleton -

Systems Neurobiology: Plasticity in the Auditory System. Jason Middleton - Systems Neurobiology: Plasticity in the Auditory System Jason Middleton - jmiddlet@pitt.edu Auditory plasticity Plasticity Early development and critical period Adult plasticity and neuromodulation Brainstem

More information

Behavioral Neuroscience: Fear thou not. Rony Paz

Behavioral Neuroscience: Fear thou not. Rony Paz Behavioral Neuroscience: Fear thou not Rony Paz Rony.paz@weizmann.ac.il Thoughts What is a reward? Learning is best motivated by threats to survival Threats are much better reinforcers Fear is a prime

More information

Presupplementary Motor Area Activation during Sequence Learning Reflects Visuo-Motor Association

Presupplementary Motor Area Activation during Sequence Learning Reflects Visuo-Motor Association The Journal of Neuroscience, 1999, Vol. 19 RC1 1of6 Presupplementary Motor Area Activation during Sequence Learning Reflects Visuo-Motor Association Katsuyuki Sakai, 1,2 Okihide Hikosaka, 1 Satoru Miyauchi,

More information

Choosing the Greater of Two Goods: Neural Currencies for Valuation and Decision Making

Choosing the Greater of Two Goods: Neural Currencies for Valuation and Decision Making Choosing the Greater of Two Goods: Neural Currencies for Valuation and Decision Making Leo P. Surgre, Gres S. Corrado and William T. Newsome Presenter: He Crane Huang 04/20/2010 Outline Studies on neural

More information

Effects of lesions of the nucleus accumbens core and shell on response-specific Pavlovian i n s t ru mental transfer

Effects of lesions of the nucleus accumbens core and shell on response-specific Pavlovian i n s t ru mental transfer Effects of lesions of the nucleus accumbens core and shell on response-specific Pavlovian i n s t ru mental transfer RN Cardinal, JA Parkinson *, TW Robbins, A Dickinson, BJ Everitt Departments of Experimental

More information

Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling

Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling Cerebral Cortex October 27;7:2443--2452 doi:.93/cercor/bhl52 Advance Access publication January 3, 27 Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling Eugene M. Izhikevich

More information

Part 11: Mechanisms of Learning

Part 11: Mechanisms of Learning Neurophysiology and Information: Theory of Brain Function Christopher Fiorillo BiS 527, Spring 2012 042 350 4326, fiorillo@kaist.ac.kr Part 11: Mechanisms of Learning Reading: Bear, Connors, and Paradiso,

More information

The individual animals, the basic design of the experiments and the electrophysiological

The individual animals, the basic design of the experiments and the electrophysiological SUPPORTING ONLINE MATERIAL Material and Methods The individual animals, the basic design of the experiments and the electrophysiological techniques for extracellularly recording from dopamine neurons were

More information

Solving the distal reward problem with rare correlations

Solving the distal reward problem with rare correlations Loughborough University Institutional Repository Solving the distal reward problem with rare correlations This item was submitted to Loughborough University's Institutional Repository by the/an author.

More information

VS : Systemische Physiologie - Animalische Physiologie für Bioinformatiker. Neuronenmodelle III. Modelle synaptischer Kurz- und Langzeitplastizität

VS : Systemische Physiologie - Animalische Physiologie für Bioinformatiker. Neuronenmodelle III. Modelle synaptischer Kurz- und Langzeitplastizität Bachelor Program Bioinformatics, FU Berlin VS : Systemische Physiologie - Animalische Physiologie für Bioinformatiker Synaptische Übertragung Neuronenmodelle III Modelle synaptischer Kurz- und Langzeitplastizität

More information

Solving the distal reward problem with rare correlations

Solving the distal reward problem with rare correlations Preprint accepted for publication in Neural Computation () Solving the distal reward problem with rare correlations Andrea Soltoggio and Jochen J. Steil Research Institute for Cognition and Robotics (CoR-Lab)

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

A Balanced Hebbian Algorithm for Associative Learning in ACT-R

A Balanced Hebbian Algorithm for Associative Learning in ACT-R A Balanced Hebbian Algorithm for Associative Learning in ACT-R Robert Thomson (thomsonr@andrew.cmu.edu), Department of Psychology, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA, 15213,

More information

Neural Cognitive Modelling: A Biologically Constrained Spiking Neuron Model of the Tower of Hanoi Task

Neural Cognitive Modelling: A Biologically Constrained Spiking Neuron Model of the Tower of Hanoi Task Neural Cognitive Modelling: A Biologically Constrained Spiking Neuron Model of the Tower of Hanoi Task Terrence C. Stewart (tcstewar@uwaterloo.ca) Chris Eliasmith (celiasmith@uwaterloo.ca) Centre for Theoretical

More information

Neural Cognitive Modelling: A Biologically Constrained Spiking Neuron Model of the Tower of Hanoi Task

Neural Cognitive Modelling: A Biologically Constrained Spiking Neuron Model of the Tower of Hanoi Task Neural Cognitive Modelling: A Biologically Constrained Spiking Neuron Model of the Tower of Hanoi Task Terrence C. Stewart (tcstewar@uwaterloo.ca) Chris Eliasmith (celiasmith@uwaterloo.ca) Centre for Theoretical

More information

How Do Computational Models of the Role of Dopamine as a Reward Prediction Error Map on to Current Dopamine Theories of Schizophrenia?

How Do Computational Models of the Role of Dopamine as a Reward Prediction Error Map on to Current Dopamine Theories of Schizophrenia? How Do Computational Models of the Role of Dopamine as a Reward Prediction Error Map on to Current Dopamine Theories of Schizophrenia? Angela J. Thurnham* (a.j.thurnham@herts.ac.uk), D. John Done** (d.j.done@herts.ac.uk),

More information

Reinforcement Learning. Odelia Schwartz 2017

Reinforcement Learning. Odelia Schwartz 2017 Reinforcement Learning Odelia Schwartz 2017 Forms of learning? Forms of learning Unsupervised learning Supervised learning Reinforcement learning Forms of learning Unsupervised learning Supervised learning

More information

Learning = an enduring change in behavior, resulting from experience.

Learning = an enduring change in behavior, resulting from experience. Chapter 6: Learning Learning = an enduring change in behavior, resulting from experience. Conditioning = a process in which environmental stimuli and behavioral processes become connected Two types of

More information

A Role for Dopamine in Temporal Decision Making and Reward Maximization in Parkinsonism

A Role for Dopamine in Temporal Decision Making and Reward Maximization in Parkinsonism 12294 The Journal of Neuroscience, November 19, 2008 28(47):12294 12304 Behavioral/Systems/Cognitive A Role for Dopamine in Temporal Decision Making and Reward Maximization in Parkinsonism Ahmed A. Moustafa,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature21682 Supplementary Table 1 Summary of statistical results from analyses. Mean quantity ± s.e.m. Test Days P-value Deg. Error deg. Total deg. χ 2 152 ± 14 active cells per day per mouse

More information

Computational & Systems Neuroscience Symposium

Computational & Systems Neuroscience Symposium Keynote Speaker: Mikhail Rabinovich Biocircuits Institute University of California, San Diego Sequential information coding in the brain: binding, chunking and episodic memory dynamics Sequential information

More information

A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior

A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior Michael E. Hasselmo Abstract & Many behavioral tasks require goal-directed actions to obtain delayed reward. The prefrontal cortex appears

More information

Chapter 5: Learning and Behavior Learning How Learning is Studied Ivan Pavlov Edward Thorndike eliciting stimulus emitted

Chapter 5: Learning and Behavior Learning How Learning is Studied Ivan Pavlov Edward Thorndike eliciting stimulus emitted Chapter 5: Learning and Behavior A. Learning-long lasting changes in the environmental guidance of behavior as a result of experience B. Learning emphasizes the fact that individual environments also play

More information

Modulators of Spike Timing-Dependent Plasticity

Modulators of Spike Timing-Dependent Plasticity Modulators of Spike Timing-Dependent Plasticity 1 2 3 4 5 Francisco Madamba Department of Biology University of California, San Diego La Jolla, California fmadamba@ucsd.edu 6 7 8 9 10 11 12 13 14 15 16

More information

Dopamine modulation in a basal ganglio-cortical network implements saliency-based gating of working memory

Dopamine modulation in a basal ganglio-cortical network implements saliency-based gating of working memory Dopamine modulation in a basal ganglio-cortical network implements saliency-based gating of working memory Aaron J. Gruber 1,2, Peter Dayan 3, Boris S. Gutkin 3, and Sara A. Solla 2,4 Biomedical Engineering

More information

Cholinergic suppression of transmission may allow combined associative memory function and self-organization in the neocortex.

Cholinergic suppression of transmission may allow combined associative memory function and self-organization in the neocortex. Cholinergic suppression of transmission may allow combined associative memory function and self-organization in the neocortex. Michael E. Hasselmo and Milos Cekic Department of Psychology and Program in

More information

Characterization of serial order encoding in the monkey anterior cingulate sulcus

Characterization of serial order encoding in the monkey anterior cingulate sulcus HAL Archives Ouvertes!France Author Manuscript Accepted for publication in a peer reviewed journal. Published in final edited form as: Eur J Neurosci. 2001 September ; 14(6): 1041 1046. Characterization

More information

Temporal Sequence Learning, Prediction, and Control - A Review of different models and their relation to biological mechanisms

Temporal Sequence Learning, Prediction, and Control - A Review of different models and their relation to biological mechanisms Temporal Sequence Learning, Prediction, and Control - A Review of different models and their relation to biological mechanisms Florentin Wörgötter and Bernd Porr Department of Psychology, University of

More information

Mechanistic characterization of reinforcement learning in healthy humans using computational models

Mechanistic characterization of reinforcement learning in healthy humans using computational models Mechanistic characterization of reinforcement learning in healthy humans using computational models Ângelo Rodrigo Neto Dias Under supervision of Tiago Vaz Maia July 14, 2014 Abstract Reinforcement learning

More information

Behavioral dopamine signals

Behavioral dopamine signals Review TRENDS in Neurosciences Vol.30 No.5 Behavioral dopamine signals Wolfram Schultz Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge CB2 3DY, UK Lesioning and

More information

TREATMENT-SPECIFIC ABNORMAL SYNAPTIC PLASTICITY IN EARLY PARKINSON S DISEASE

TREATMENT-SPECIFIC ABNORMAL SYNAPTIC PLASTICITY IN EARLY PARKINSON S DISEASE TREATMENT-SPECIFIC ABNORMAL SYNAPTIC PLASTICITY IN EARLY PARKINSON S DISEASE Angel Lago-Rodriguez 1, Binith Cheeran 2 and Miguel Fernández-Del-Olmo 3 1. Prism Lab, Behavioural Brain Sciences, School of

More information

A computational model of integration between reinforcement learning and task monitoring in the prefrontal cortex

A computational model of integration between reinforcement learning and task monitoring in the prefrontal cortex A computational model of integration between reinforcement learning and task monitoring in the prefrontal cortex Mehdi Khamassi, Rene Quilodran, Pierre Enel, Emmanuel Procyk, and Peter F. Dominey INSERM

More information

Is model fitting necessary for model-based fmri?

Is model fitting necessary for model-based fmri? Is model fitting necessary for model-based fmri? Robert C. Wilson Princeton Neuroscience Institute Princeton University Princeton, NJ 85 rcw@princeton.edu Yael Niv Princeton Neuroscience Institute Princeton

More information

Synaptic plasticityhippocampus. Neur 8790 Topics in Neuroscience: Neuroplasticity. Outline. Synaptic plasticity hypothesis

Synaptic plasticityhippocampus. Neur 8790 Topics in Neuroscience: Neuroplasticity. Outline. Synaptic plasticity hypothesis Synaptic plasticityhippocampus Neur 8790 Topics in Neuroscience: Neuroplasticity Outline Synaptic plasticity hypothesis Long term potentiation in the hippocampus How it s measured What it looks like Mechanisms

More information

Neural Networks 39 (2013) Contents lists available at SciVerse ScienceDirect. Neural Networks. journal homepage:

Neural Networks 39 (2013) Contents lists available at SciVerse ScienceDirect. Neural Networks. journal homepage: Neural Networks 39 (2013) 40 51 Contents lists available at SciVerse ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet Phasic dopamine as a prediction error of intrinsic and

More information

Supplementary Figure 1. Example of an amygdala neuron whose activity reflects value during the visual stimulus interval. This cell responded more

Supplementary Figure 1. Example of an amygdala neuron whose activity reflects value during the visual stimulus interval. This cell responded more 1 Supplementary Figure 1. Example of an amygdala neuron whose activity reflects value during the visual stimulus interval. This cell responded more strongly when an image was negative than when the same

More information

Efficient Emulation of Large-Scale Neuronal Networks

Efficient Emulation of Large-Scale Neuronal Networks Efficient Emulation of Large-Scale Neuronal Networks BENG/BGGN 260 Neurodynamics University of California, San Diego Week 6 BENG/BGGN 260 Neurodynamics (UCSD) Efficient Emulation of Large-Scale Neuronal

More information

Lateral Inhibition Explains Savings in Conditioning and Extinction

Lateral Inhibition Explains Savings in Conditioning and Extinction Lateral Inhibition Explains Savings in Conditioning and Extinction Ashish Gupta & David C. Noelle ({ashish.gupta, david.noelle}@vanderbilt.edu) Department of Electrical Engineering and Computer Science

More information

Intrinsically Motivated Learning of Hierarchical Collections of Skills

Intrinsically Motivated Learning of Hierarchical Collections of Skills Intrinsically Motivated Learning of Hierarchical Collections of Skills Andrew G. Barto Department of Computer Science University of Massachusetts Amherst MA barto@cs.umass.edu Satinder Singh Computer Science

More information

From Numerosity to Ordinal Rank: A Gain-Field Model of Serial Order Representation in Cortical Working Memory

From Numerosity to Ordinal Rank: A Gain-Field Model of Serial Order Representation in Cortical Working Memory 8636 The Journal of Neuroscience, August 8, 2007 27(32):8636 8642 Behavioral/Systems/Cognitive From Numerosity to Ordinal Rank: A Gain-Field Model of Serial Order Representation in Cortical Working Memory

More information

Short-term memory traces for action bias in human reinforcement learning

Short-term memory traces for action bias in human reinforcement learning available at www.sciencedirect.com www.elsevier.com/locate/brainres Research Report Short-term memory traces for action bias in human reinforcement learning Rafal Bogacz a,b,, Samuel M. McClure a,c,d,

More information

A neural circuit model of decision making!

A neural circuit model of decision making! A neural circuit model of decision making! Xiao-Jing Wang! Department of Neurobiology & Kavli Institute for Neuroscience! Yale University School of Medicine! Three basic questions on decision computations!!

More information

Parkinsonism or Parkinson s Disease I. Symptoms: Main disorder of movement. Named after, an English physician who described the then known, in 1817.

Parkinsonism or Parkinson s Disease I. Symptoms: Main disorder of movement. Named after, an English physician who described the then known, in 1817. Parkinsonism or Parkinson s Disease I. Symptoms: Main disorder of movement. Named after, an English physician who described the then known, in 1817. Four (4) hallmark clinical signs: 1) Tremor: (Note -

More information

Statistical Models of Conditioning

Statistical Models of Conditioning Statistical Models of Conditioning Peter Dayan* Brain & Cognitive Sciences E25-2IDMIT Cambridge, MA 02139 Theresa Long 123 Hunting Cove Williamsburg, VA 23185 Abstract Conditioning experiments probe the

More information

Neurobiological Foundations of Reward and Risk

Neurobiological Foundations of Reward and Risk Neurobiological Foundations of Reward and Risk... and corresponding risk prediction errors Peter Bossaerts 1 Contents 1. Reward Encoding And The Dopaminergic System 2. Reward Prediction Errors And TD (Temporal

More information

A Dynamic Neural Network Model of Sensorimotor Transformations in the Leech

A Dynamic Neural Network Model of Sensorimotor Transformations in the Leech Communicated by Richard Andersen 1 A Dynamic Neural Network Model of Sensorimotor Transformations in the Leech Shawn R. Lockery Yan Fang Terrence J. Sejnowski Computational Neurobiological Laboratory,

More information

Theoretical and Empirical Studies of Learning

Theoretical and Empirical Studies of Learning C H A P T E R 22 c22 Theoretical and Empirical Studies of Learning Yael Niv and P. Read Montague O U T L I N E s1 s2 s3 s4 s5 s8 s9 Introduction 329 Reinforcement learning: theoretical and historical background

More information

OPTO 5320 VISION SCIENCE I

OPTO 5320 VISION SCIENCE I OPTO 5320 VISION SCIENCE I Monocular Sensory Processes of Vision: Color Vision Mechanisms of Color Processing . Neural Mechanisms of Color Processing A. Parallel processing - M- & P- pathways B. Second

More information

Computational cognitive neuroscience: 2. Neuron. Lubica Beňušková Centre for Cognitive Science, FMFI Comenius University in Bratislava

Computational cognitive neuroscience: 2. Neuron. Lubica Beňušková Centre for Cognitive Science, FMFI Comenius University in Bratislava 1 Computational cognitive neuroscience: 2. Neuron Lubica Beňušková Centre for Cognitive Science, FMFI Comenius University in Bratislava 2 Neurons communicate via electric signals In neurons it is important

More information

Dopamine reward prediction error signalling: a twocomponent

Dopamine reward prediction error signalling: a twocomponent OPINION Dopamine reward prediction error signalling: a twocomponent response Wolfram Schultz Abstract Environmental stimuli and objects, including rewards, are often processed sequentially in the brain.

More information

Temporal coding in the sub-millisecond range: Model of barn owl auditory pathway

Temporal coding in the sub-millisecond range: Model of barn owl auditory pathway Temporal coding in the sub-millisecond range: Model of barn owl auditory pathway Richard Kempter* Institut fur Theoretische Physik Physik-Department der TU Munchen D-85748 Garching bei Munchen J. Leo van

More information

BIPN 140 Problem Set 6

BIPN 140 Problem Set 6 BIPN 140 Problem Set 6 1) Hippocampus is a cortical structure in the medial portion of the temporal lobe (medial temporal lobe in primates. a) What is the main function of the hippocampus? The hippocampus

More information