Reward Hierarchical Temporal Memory

Size: px
Start display at page:

Download "Reward Hierarchical Temporal Memory"

Transcription

1 WCCI 2012 IEEE World Congress on Computational Intelligence June, 10-15, Brisbane, Australia IJCNN Reward Hierarchical Temporal Memory Model for Memorizing and Computing Reward Prediction Error by Neocortex Hansol Choi, Jun-Cheol Park, Jae Hyun Lim, Jae Young Jun and Dae-Shik Kim Korea Advanced Institute of Science and Technology Daejeon, Republic of Korea capine@gmail.com Abstract In humans and animals, reward prediction error encoded by dopamine systems is thought to be important in the temporal difference learning class of reinforcement learning (RL). With RL algorithms, many brain models have described the function of dopamine and related areas, including the basal ganglia and frontal cortex. In spite of this importance, how the reward prediction error itself is computed is not understood well, including the problem of how the current states are assigned to a memorized states and how the values of the states are memorized. In this paper, we describe a neocortical model for memorizing state space and computing reward prediction error, known as reward hierarchical temporal memory (rhtm). In this model, the temporal relationships among events are hierarchically stored. Using this memory, rhtm computes reward prediction errors by associating the memorized sequences to rewards and inhibits the predicted reward. In a simulation, our model behaved similarly to dopaminergic neurons. We suggest that our model can provide a hypothetical framework of interaction between cortex and dopamine neurons. δt= rt+1+γv(st+1)-v(st) To compute the TD error, several steps are required. First, the current and previous neural patterns must be assigned to some states (St and St+1). If the patterns are similar enough to previously memorized states, the memorized states must be assigned to the patterns. Otherwise, a new state must be assigned and memorized. Second, the values of the previous and current states should be restored from memory (V(St) and V(St+1)). Third, the values should be summated and negated with the current reward (rt+1). In most TD learning models of the brain related to dopamine functions, the values of states are stored in a numerical value matrix [1], [12]. When a state pattern is delivered, the numerical values are restored from the matrix and used for computing the TD error with the current reward given. This requires brain systems to store the absolute value of states and to buffer the values for the computations, for which no biological evidence exists. Moreover, when associating a new reward-predicting event to a reward event in TD learning, the value of the reward-predicting event is updated by updating the values of intermediate patterns between the two events (see Fig. 6a for the Pavlovian conditioning case). TD learning predicts that this is done by the TD error of the intermediate states between the two events during learning [13]. In contrast to the TD model s prediction, evidence exists that DA signals are given only to the reward signal in the beginning of learning and only to the reward predicting signal upon learning saturation. No intermediate dopamine signal is observed [14]. The lack of the DA activity in the intermediate states creates a problem known as the distal reward problem. The brain should find what state is responsible for the reward when the patterns are no longer there. Moreover, the values of the patterns should be directly updated without the help of intermediate patterns. Keywords-component; reward; reinforcement learning; reward prediction error; temporal difference; HTM; rhtm; reward-htm I. BACKGROUND A. Reward prediction Error Reinforcement Learning (RL) has been one of the most influential computational theories in neuroscience [1] since Sutton and Barto proposed and developed it [2]. In the RL systems of the brain, dopaminergic neurons are thought to encode reward prediction error (RPE) [8 10] which is the core of RL. In particular, Schultz and colleagues [3 5] proposed neurophysiological data to suggest that dopaminergic neurons encode the temporal difference error. A type of RL known as temporal difference (TD) learning has been thoroughly studied since. Moreover, many computational models proposed the functional mechanisms of dopamine (DA) in the basal ganglia and related areas with the TD learning model [6], [7]. Despite its wide acceptance, the TD model has several limitations in explaining the behavior of dopaminergic neurons (as explained below). Another problem in current dopamine models is the lack of a model mechanism for controlling the activity of dopaminergic neurons. Computing the reward prediction error is the core of RL in which the values and actions of the states are guided and learned. Despite this importance, only a few models describe how the DA activity is controlled with respect to memory. More precisely, how the important states for reward information and the structure of the relationship between the states are memorized and how the memory is utilized for mapping the current state into previous knowledge are not understood well. Moreover, how the states are TD error is the difference between the expected value of the previous state (V(St)) and the value actually delivered (the currently delivered reward (rt+1 and the value of current state V(St+1) with discount γ). [11] This project was supported by the KI brand project, KI, KAIST in 2012 and by the next generation haptic user interface project funded by Kolon Industries in 2012 U.S. Government work not protected by U.S. copyright (1) 509

2 associated with reward values were not dealt with in previous studies. In this paper, we suggest that the hierarchical structure of the neocortical memory system is related to the problems above encompassing the state assignment, the storing of a numerical value and the distal reward problem. We developed a novel algorithm to compute the RPE, reward Hierarchical Temporal Memory (rhtm). In this hierarchical structure, a higher node forms a more stable and more abstract model of input patterns with larger spatio-temporal receptive fields. Temporally, a stable state in higher nodes can unfold into fast changes in the lower levels of the hierarchy. The unfolded information predicts the next state of the lower regions. As a result, a HTM can learn the structure of sequences, infer the causes of the sequences, and recall them with a hierarchy [15]. The rhtm exploits this spatial and temporal pattern recognition feature of HTM in the memorizing structure of event sequences and uses memory to assign a current input pattern to the memorized states and a computing reward. In a HTM, the input pattern activates several states in multiple regions in the hierarchy. By assigning delivered reward into currently active states, we can assign a reward to the sequences which are causes or to the contexts of the current event. In this paper, we suggest that rhtm can solve the distal reward problem and produce an activity resembling DA activity. In the following sections, we explain the algorithm in detail with Pavlovian conditioning and Instrumental learning tasks. Figure 1. Memorizing pavlovian conditioning by a HTM. (A) Sensory input: CS is followed by US with a constant temporal gap. This sequence is repeated with a random temporal delay. (B) CS and US introduce the neural correlate of cs and us in brains. A cs state generates the intermediate neural states of s1 and s2. Those task sequences are connected by diverse uncorrelated patterns (random). (C) The cs-to-us sequence was learned by HTM. In level 1, the csto-us state sequence is represented as a state (pavlovseq). (D).The structure of HTM: large boxes denote the HTM regions. B. Hierarchical Temporal Memory A Hierarchical Temporal Memory (HTM) is a functional model for neocortex, recently developed by Hawkins and colleagues [15], [16]. It builds a spatio temporal model of the world from sequences of sensory input patterns and uses the models for the prediction of subsequent inputs and the inference of causes. The HTM model consists of a tree-shaped hierarchy of memory regions. Every region shares the same algorithm to build the spatio-temporal model of the input pattern sequences to the region, regardless of its position in the hierarchy. The algorithm is composed of a two-step Bayesian process which involves spatial pooling and temporal pooling. When an input pattern is submitted to the HTM, each region assigns Bayesian beliefs to spatial patterns which the region learned previously. Each belief is the similarity between a spatial pattern and the input pattern. This process is known as spatial pooling. The spatial patterns input to the regions are categorized in this step. Next, from the beliefs of spatial patterns, the beliefs of temporal patterns are updated (temporal pooling). A temporal pattern is a group of spatial patterns that frequently occurs together. The main idea in this procedure is that patterns which frequently occur together in time share a common cause. A temporal pattern is a spatio-temporal model of an independent entity projected onto a HTM region. Beliefs about temporal patterns are the current states of a node and are fed forward to a higher node. Figure 2. The activity of the rhtm in a Pavlovian conditioning task before learning rewards. Each gray cell shows an active pattern over time (y-axis). CS and US show that the time input patterns are given. US is associated with a reward (cross), and as it is unpredictable, RPE is given when us is activated (white cross under RPE). Here, the level 1 pattern of pavlovseq is shown to be active when the RPE is given. This RPE associates pavlovseq with a reward. Figure 3. The activity of rhtm after learning the rewards. Here, pavlovseq is associated with a reward. In the atrial, when the current pattern is CS, cs and pavlovseq are activated, as pavlovseq is associated with the reward and the reward is not predicted, and RPE is given (RPE on the right). When te current 510

3 state becomes US, the rhtm can predict us from the previous value of s2. s2 inhibits the reward prediction error function. As a result, no RPE is produced. II. ALGORITHM OF REWARD HIERARCHICAL TEMPORAL MEMORY A. Computing RPE for a Pavlovian Conditioning Task The main algorithm of rhtm consists of three mechanisms: sequence structure learning, reward association and reward prediction. We ll explain the main algorithm of rhtm in terms of Pavlovian conditioning [17], which is one of the most important tasks in reward learning systems. In Pavlovian conditioning, conditioned stimuli (CS) are repetitively followed by unconditioned stimuli (US) with constant temporal delays (Fig. 1A). CSs are biologically meaningless stimuli (i.e., a bell sound), and USs are biologically relevant rewards that present a stimuli (i.e., sugar water). Tasks are repeated with a random delay. In the beginning of learning, dopamine is activated only when a US is given. As learning proceeds and the USs become predictable, the DA activity for a US diminishes. On the other hand, the DA becomes active for CSs. After learning, the DA fires only for a CS which is unexpected as tasks are given with random delays. DA neurons do not fire for the delivery of USs fully predicted with CSs. Omitting a US generates a negative TD error as the delivered value is smaller than the expected value. at Level 1). In the rhtm, the states, which are active when a positive RPE given, update the association with the reward by formula (4). Fig. 3 shows the behavior of the rhtm after learning a reward. The delivery of CSs activates pavlovseq, which is associated with a reward. This activation of pavlovseq produces a reward value which is not predicted. As a result, it gives the RPE (Fig. 3, RPE on the right). reward t (S) = reward t-1 (S)+γRPEactive(S) active(s)= 1 if pattern S is active = 0, if pattern S is inactive (2) 3) Inhibition of predicted reward Signal USs (Fig. 2) always follow s2 at the level 0 sequence. In the rhtm, we can use this information for predicting a reward (Fig. 3, black arrow beginning from s2 and ending at us). Deactivating a previous state to a reward-associated state inhibits RPE. The inhibition function of a state can be updated with the RPE by formula (5). In our example, the deactivation of s2 inhibits RPE, as it predicts a reward coming in, us. With this inhibition, the reward delivered by us is inhibited and no RPE is given (black arrow in Fig. 3). Prediction(S)= prediction(s)-α(reward(s)- prediction(s)) 1) Inferring the Causes or Context States of the Input Sequence of Events in Pavlovian Conditioning We hypothesized that CSs and USs will produce the corresponding neural patterns css and uss in the neocortex. We hypothesized also that the neural correlate for CS (cs in Fig. 1B) would produce stereotypical neural patterns (s1 and s2 in Fig. 1B). The cs, s1, s2 and us sequences would be repeated for Pavlovian tasks. Those sequences are input to the base region of the rhtm (level 0 in Fig. 1B). This region groups the four spatial patterns into a temporal pattern. The temporal pattern is input to a higher region (Fig. 1. CD; pavlovseq is the temporal pattern and level 1 is the higher region). In level 1, pavlovseq is active while a CS-to-US sequence is processed. The activity of HTM during a CS-to-US sequence is summarized in Figure 2. As the input patterns change from noisy random patterns (random1 in Fig. 2) to cs, pavlovseq becomes active in level 1. The activity of level 1 feeds back information for the prediction of the next step to level 0. Pavlovseq is deactivated when the following cs is deactivated, representing the end of the learned sequence. (3 2) Associating Rewards to Active States when a Reward is Given USs are the primary reward stimuli (Fig. 2, white cross in the table). Without prior knowledge, a RPE is given when a state transits to USs (Fig. 2, white cross on the right). When a state moves to a US and a RPE is given by the activation of us, pavlovseq is active at level 1. This means that during the activation of pavlovseq, a reward will be given. To memorize this, we associated pavlovseq with a reward (Fig. 3, white cross Figure 4 Go-Nogo instrumental conditioning. (A) Beginning of the task (Cue) is followed by an action target (GoCue). Animals have a policy of selecting one of two options (p=0.5 for Go and NoGo). After some delay (Wait), a reward or punishment is given. (B) The sensory signal and the activity gives neural correlates which are linked with intermediate sequences. (C) The common intermediate patterns of w1 and w2 can be split into separate patterns according to the context of the sequences. The level 1 region learns the split patterns and finds the temporal sequence of the patterns, which are the inputs to level 2 (D). As a higher region recognizes the sequences of patterns with lower correlations, (E) the level 2 region learns the entire task as a pattern. 511

4 it cannot be associated with a reward when NoGo is selected. As no state predicts a reward for gonogotask, partial RPE is given when Cue is given. Also, goseq is fully associated with a reward and partially inhibited by the end of cueseq, by which only half of the sequence produces RPE. In addition, rew gives a reward and is inhibited by g2. As a result, rhtm produces a partial reward for Cue and Go and a partial negative reward for a NoGo signal after learning. III. SIMULATION Figure 5 shows the activity of the instrumental conditioning HTM. The rhtm learns pattern sequences and then associates states with rewards. In Temporal Difference Learning without a hierarchy, the beginning of the task and 'Go' will result in a partial reward prediction error and no reward prediction error for a reward signal itself. The rhtm generates the same RPE signals with a partial prediction of a reward-associated pattern and a partial paring of a reward and patterns (right). B. Computing the RPE for Instrumental Conditioning Next, we explain the behavior of the RPE with a more complex system, instrumental conditioning [18]. In contrast to Pavlovian conditioning, instrumental conditioning has probabilistic transitions between states that are derived from the probabilistic policies of the subject. In instrumental conditioning, a reward-neutral action of an animal is associated with a reward. As the animal learns this condition, the animal comes to repeat the reward-giving actions. Here, a subset of instrumental conditioning, a go-nogo discrimination task, will be used. After a cue for the beginning a task, one of two task cues are presented to the subject animal (Fig. 3A; Go Cue). The subjects have to choose between one of two choices (Fig. 3A; Go or NoGo). After some delay, (Fig. 3A; Wait) a reward is given for only one choice (Fig. 3A; Rew). A reward is not delivered for the other choice (Fig. 3A; Pur). To simplify the understanding of this, we use a fixed policy for choosing each action with a probability of 0.5. As in the Pavlovian task, we hypothesized that each behavioral event (Fig. 4A) produces neural correlates and that they are followed by intermediate neural patterns (Fig. 4B). One problem with this is that the some intermediate states may be indistinguishable from to each other (in Fig. 4B, both go and nogo are followed by w1). Hawkins et al. solved this problem using states that are distinguished depending in the context. This method diversifies a common state into different states based on previous states. (Fig. 4C demonstrates that w1 is diversified into g1 and n1 based on the context of go and nogo.) Highly correlated sequences of states are recognized at level 1 (Fig. 4D, cueseq, goseq, and nogoseq). The states in level 1 are grouped in a higher region (Fig. 4E; gonogotask). As the policy on the choice is constant at 0.5, we can summarize the behavior of this rhtm, as shown in Fig. 5. Moreover, gonogotask is partially associated with a reward, as A. Method 1) Pavlovian Conditioning We trained an rhtm for computing reward prediction error in Pavlovian conditioning. A training trial composed of a cs state, which was followed by 18 intermediate states distinguishable from each other and then a us with a reward signal. This trial was repeated after a random length of states with noise signals (0 to 100 time steps in a uniform distribution). Each state was represented as a unique text pattern. A two-level rhtm was trained to simulate the development of a reward-prediction error during a Pavlovian task. Training was performed in two steps: in the first step, the rhtm learned the structure of the event sequences. The task sequence was given to the rhtm without reward information for 100 trials. In the second step, the development of reward-prediction error was simulated. A state sequence was inserted into the rhtm and primary reward was given when the state was us. Trials were repeated 40 times and the development of an association between the states and a reward and the reward prediction error was observed. Upon the 30 th trial, the reward was omitted to observe the behavior of the system when a predicted reward was omitted. To compare our methods to those of previous reports, the development of a TD error and the values of states were simulated for a conventional TD reinforcement learning model [13]. In the simulation, the value of 20 (us, intermediate and cs) states and the reward-prediction error during the same Pavlovian conditioning situation were observed. 2) Go-Nogo Task To verify that our model can be applied to a more complex system, we used our model to simulate RPE in a go-nogo task. The overall structure of our task is identical to the example explained earlier (Fig. 4b). In a trial, a cue signal is followed by a task cue, which is either a go or a no-go signal. Based on the task cue, a subject should select either the go or no-go action. Rewards are given only for one action selection. In our simulation, the task cues were fixed to the go signal and the policy of the agent was to choose each the go or no-go action at a probability of 0.5. The time between the states was filled with intermediate states, as shown in Fig. 4b. Rewards were given for a reward signal and no other rewards or punishments were given. We trained a three-layer rhtm on the structure of the event in 350 trials (half were go and half were nogo) within random noise states. After that, another 350 trials with reward information were given and the development of the rewardprediction error was monitored. 512

5 Figure 6. Simulation of Pavlovian conditioning with TD learning and reward HTM. cs and us were connected by 18 intermediate states. The value (A) and reward prediction error (B) for the state transitions in each trial are computed by TD learning (upper). In TD learning, positive TD values for intermediate states occur, which is not observed biological systems. The values of the intermediate sequences are set to one, as in learning. Compared to the traditional temporal difference model, rhtm associates rewards to patterns (C, inlet) and gives a reward when the pattern is activated (the shape of the left ridge in D is identical to that of the inlet of C). The prediction of a rewardsequence inhibits (C), the predicted reward-prediction error signal. This produces a reward prediction error (D). The RPE is not shown in the intermediate sequences. A negative RPE was noted in trial 30, where the reward was omitted. B. Result The reward prediction errors for states CS and US were similar for TD learning and the reward HTM models. In both models, the RPE upon US delivery diminishes as the learning progresses and the RPE appears upon delivery of the CS (Fig. 6, right). Both models show a negative RPE to an omitted delivery of the US when the predicted reward is not delivered - as evidenced by the activity of dopaminergic neurons in the basal ganglia. The main difference of TD learning model and the rhtm is the reward prediction error value of the intermediate states during the learning progress (Fig. 6B, arrow, and Fig. 6D). In the TD learning model, the values of the intermediate patterns held out to the CS (Fig. 6A) according to the positive temporal difference values in the intermediate patterns (Fig. 6B, arrow). Otherwise, the rhtm computes only the reward association to a state and the prediction of a reward in the next state. The representative state of the CS-to-US sequence at a higher level (the equivalent of pavseq in Figs. 1-3) is associated with a reward. This is shown in the inlet of Figure 6B. By finding the sequence structure within the state transition patterns, our model solves the distal reward problem. Moreover, a reward is associated with a CS without intermediate TD values. As a reward delivered by a US is becoming predictable by the previous state (Fig. 6B), the reward-prediction error in the US decreases with the negation of the reward delivered from the US and the reward prediction from the HTM. Figure 7. Development of the reward prediction error during instrumental conditioning. The reward prediction errors after each state as submitted are shown here. others refers to all other events except those specified. They were 0 during the simulation process. In the go-nogo task, RPEs were observed only when the cue, go, nogo and reward states are activated. Those states are the first states of temporal patterns in higher regions or the state with the primary reward information (Fig. 7). Those are states directly related to determining the RPE. The reward prediction error in other states did not change from zero during the entire simulation process (Fig. 7; other), as in the intermediate states during Pavlovian conditioning. The result shows that the RPE from reward diminishes to zero as the reward of reward becomes predictable (Fig. 7; other). The cue state, which is the beginning of a trial, produces a RPE throughout the simulation, as the signal is given within the noise sequence and the cue signal is not predictable. Interesting is the behavior of RPE during go and nogo. As the probability for go and nogo is 0.5 in each case, the go state and the nogo state never become totally predictable. As the go state can predict that a reward signal will be coming, selecting go produces a RPE. Otherwise, selecting nogo produces a negative reward prediction error, as the predicted reward from entering the trial became zero. This behavior is similar to the behavior of the TD error except that the intermediate states produce a TD error during TD learning. IV. DISCUSSION In this paper, we introduced a hierarchical model for the learning sequences of events and used the information to computing the reward prediction errors. The HTM framework can memorize patterns and relationships between patterns to identify the structure of events given to the HTM. Associating a reward to the representation of memorized sequences and using the memory to inhibit predictable rewards, we could develop a model which computes the RPE similarly to the DA neuron in Pavlovian conditioning. Our model behavior resembles DA activity with no signal wave in intermediate sequences, which the previous TD learning method falsely predicted to exist. Moreover, our model could mimic the behavior of DA in instrumental conditioning, especially the negative RPE for the selection of non-reward-giving actions. As our sequence memory could find the beginning of a sequence, the distal reward problem could be solved by associating a reward with the representation of the sequence in higher regions. Previously, Izhikevich suggested a model to solve the distal reward problem which involved interaction 513

6 between the STDP and dopamine neurons [19]. He showed the movement of the DA activity from the US to the CS. The problem was that the DA activity to the US disappeared at all. Otherwise, when a CS context is not given, a US should give the DA signal in itself. Additionally, Izhikevich's model could not show the reduced activity of dopamine activity upon the omitting of the US, which was shown in our model. Our model explains the critical part of reinforcement learning in the actor-critic framework, which gives information about whether the current policy is good or bad. For the actor part, we used a fixed policy in instrumental conditioning. This helped describe the activity of our model as a critic. In actual reinforcement learning, the reward prediction error from the critic should update the policy of the actor. How our memory system would be updated with the RPE signal and the action is left as another important issue to study. With the algorithm, the complete actor-critic model based on HTM may be able to learn the state space in a more active manner, but this is out of the scope of this paper. Recently, a study on monkey brains reported direct control of the activity of dopaminergic neurons by mpfc activities [24]. DA neurons were both activated and inhibited by stimulating mpfc areas. Applying this to our model, the activation of DA neurons may arise from the state-representing neuron, which is associated with a reward, and the inhibition of DA neurons may be associated with the reward prediction by the previous states. With a well-developed hierarchical memory model of the neocortical system, our model can explain the behavior of the DA with a simple potentiation rule between the neocortex and the DA cells. We suggest that our model does not require explicit numerical values to be assigned to every state. To verify the feasibility of our model, biological experiments are required. As our model depends on hierarchical event sequence memory, we can predict that two independent events in the same higher states can affect the reward value of each other by changing the reward value of the common higher node. We can verify this in human or animal behavioral experiments. V. CONCLUSION We proposed a novel hierarchical memory model for the behavior of dopaminergic neurons. In this model, the dopamine system and neocortex memorize and compute the reward prediction error with the hierarchical event memory. Our model can explain several limitations that the existing TD model of DA behavior cannot explain. First, it is in good agreement with biological observations of dopaminergic neurons. Second, it has a model for interaction between the neocortical memory and the reward prediction error. Finally, the state assignment and event sequence structure can be learnt by rhtm. References [1] M. Kawato and K. Samejima, Efficient reinforcement learning: computational theories, neuroscience and robotics, Current Opinion in Neurobiology, vol. 17, no. 2, pp , Apr [2] R. S. Sutton and A. G. Barto, Reinforcement learning. MIT Press, [3] W. Schultz, P. Dayan, and P. R. Montague, A Neural Substrate of Prediction and Reward, Science, vol. 275, no. 5306, pp , Mar [4] P. N. Tobler, C. D. Fiorillo, and W. Schultz, Adaptive Coding of Reward Value by Dopamine Neurons, Science, vol. 307, no. 5715, pp , Mar [5] C. D. Fiorillo, P. N. Tobler, and W. Schultz, Discrete coding of reward probability and uncertainty by dopamine neurons, Science (New York, N.Y.), vol. 299, no. 5614, pp , Mar [6] M. X. Cohen and M. J. Frank, Neurocomputational models of basal ganglia function in learning, memory and choice, Behavioral brain research, vol. 199, no. 1, pp , Apr [7] T. V. Maia and M. J. Frank, From reinforcement learning models to psychiatric and neurological disorders, Nat Neurosci, vol. 14, no. 2, pp , Feb [8] H. Nakahara, H. Itoh, R. Kawagoe, Y. Takikawa, and O. Hikosaka, Dopamine Neurons Can Represent Context- Dependent Prediction Error, Neuron, vol. 41, no. 2, pp , Jan [9] N. D. Daw, Y. Niv, and P. Dayan, Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control, Nature Neuroscience, vol. 8, no. 12, pp , Dec [10] K. A. Zaghloul et al., Human Substantia Nigra Neurons Encode Unexpected Financial Rewards, Science, vol. 323, no. 5920, pp , Mar [11] P. R. Montague, P. Dayan, C. Person, and T. J. Sejnowski, Bee foraging in uncertain environments using predictive Hebbian learning, Nature, vol. 377, no. 6551, pp , Oct [12] M. Silvetti, R. Seurinck, and T. Verguts, Value and prediction error in medial frontal cortex: integrating the single-unit and systems levels of analysis, Frontiers in Human Neuroscience, vol. 5, p. 75, [13] W. Schultz, P. Dayan, and P. R. Montague, A neural substrate of prediction and reward, Science (New York, N.Y.), vol. 275, no. 5306, pp , Mar [14] C. L. Hull, Principles of behavior: an introduction to behavior theory. Oxford, England: Appleton-Century, [15] J. Hawkins, D. George, and J. Niemasik, Sequence memory for prediction, inference and behavior, Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, vol. 364, no. 1521, pp , May [16] D. George and J. Hawkins, Towards a mathematical theory of cortical micro-circuits, PLoS Computational Biology, vol. 5, no. 10, p. e , Oct

7 [17] R. A. Rescorla, Behavioral Studies of Pavlovian Conditioning, Annual Review of Neuroscience, vol. 11, no. 1, pp , Mar [18] B. W. Balleine, M. Liljeholm, and S. B. Ostlund, The integrative function of the basal ganglia in instrumental conditioning, Behavioral Brain Research, vol. 199, no. 1, pp , Apr [19] E. M. Izhikevich, Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling, Cerebral Cortex, vol. 17, no. 10, pp , Oct [20] J. Friedrich, R. Urbanczik, and W. Senn, Spatio- Temporal Credit Assignment in Neuronal Population Learning, PLoS Comput Biol, vol. 7, no. 6, p. e , Jun [21] J. J. F. Ribas-Fernandes et al., A Neural Signature of Hierarchical Reinforcement Learning, Neuron, vol. 71, no. 2, pp , Jul [22] M. Schembri, M. Mirolli, and G. Baldassarre, Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot, in IEEE 6th International Conference on Development and Learning, ICDL 2007, 2007, pp [23] J. Gläscher, N. Daw, P. Dayan, and J. P. O Doherty, States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning, Neuron, vol. 66, no. 4, pp , May [24] D. E. Moorman and G. Aston-Jones, Orexin/hypocretin modulates response of ventral tegmental dopamine neurons to prefrontal activation: diurnal influences, The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, vol. 30, no. 46, pp , Nov

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain Reinforcement learning and the brain: the problems we face all day Reinforcement Learning in the brain Reading: Y Niv, Reinforcement learning in the brain, 2009. Decision making at all levels Reinforcement

More information

A Model of Dopamine and Uncertainty Using Temporal Difference

A Model of Dopamine and Uncertainty Using Temporal Difference A Model of Dopamine and Uncertainty Using Temporal Difference Angela J. Thurnham* (a.j.thurnham@herts.ac.uk), D. John Done** (d.j.done@herts.ac.uk), Neil Davey* (n.davey@herts.ac.uk), ay J. Frank* (r.j.frank@herts.ac.uk)

More information

A model to explain the emergence of reward expectancy neurons using reinforcement learning and neural network $

A model to explain the emergence of reward expectancy neurons using reinforcement learning and neural network $ Neurocomputing 69 (26) 1327 1331 www.elsevier.com/locate/neucom A model to explain the emergence of reward expectancy neurons using reinforcement learning and neural network $ Shinya Ishii a,1, Munetaka

More information

NEOCORTICAL CIRCUITS. specifications

NEOCORTICAL CIRCUITS. specifications NEOCORTICAL CIRCUITS specifications where are we coming from? human-based computing using typically human faculties associating words with images -> labels for image search locating objects in images ->

More information

PIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior. Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim

PIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior. Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim PIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim Quiz Briefly describe the neural activities of minicolumn in the

More information

Toward a Mechanistic Understanding of Human Decision Making Contributions of Functional Neuroimaging

Toward a Mechanistic Understanding of Human Decision Making Contributions of Functional Neuroimaging CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE Toward a Mechanistic Understanding of Human Decision Making Contributions of Functional Neuroimaging John P. O Doherty and Peter Bossaerts Computation and Neural

More information

Different inhibitory effects by dopaminergic modulation and global suppression of activity

Different inhibitory effects by dopaminergic modulation and global suppression of activity Different inhibitory effects by dopaminergic modulation and global suppression of activity Takuji Hayashi Department of Applied Physics Tokyo University of Science Osamu Araki Department of Applied Physics

More information

Memory, Attention, and Decision-Making

Memory, Attention, and Decision-Making Memory, Attention, and Decision-Making A Unifying Computational Neuroscience Approach Edmund T. Rolls University of Oxford Department of Experimental Psychology Oxford England OXFORD UNIVERSITY PRESS Contents

More information

Effects of lesions of the nucleus accumbens core and shell on response-specific Pavlovian i n s t ru mental transfer

Effects of lesions of the nucleus accumbens core and shell on response-specific Pavlovian i n s t ru mental transfer Effects of lesions of the nucleus accumbens core and shell on response-specific Pavlovian i n s t ru mental transfer RN Cardinal, JA Parkinson *, TW Robbins, A Dickinson, BJ Everitt Departments of Experimental

More information

Learning Working Memory Tasks by Reward Prediction in the Basal Ganglia

Learning Working Memory Tasks by Reward Prediction in the Basal Ganglia Learning Working Memory Tasks by Reward Prediction in the Basal Ganglia Bryan Loughry Department of Computer Science University of Colorado Boulder 345 UCB Boulder, CO, 80309 loughry@colorado.edu Michael

More information

Human and Optimal Exploration and Exploitation in Bandit Problems

Human and Optimal Exploration and Exploitation in Bandit Problems Human and Optimal Exploration and ation in Bandit Problems Shunan Zhang (szhang@uci.edu) Michael D. Lee (mdlee@uci.edu) Miles Munro (mmunro@uci.edu) Department of Cognitive Sciences, 35 Social Sciences

More information

Dopamine neurons activity in a multi-choice task: reward prediction error or value function?

Dopamine neurons activity in a multi-choice task: reward prediction error or value function? Dopamine neurons activity in a multi-choice task: reward prediction error or value function? Jean Bellot 1,, Olivier Sigaud 1,, Matthew R Roesch 3,, Geoffrey Schoenbaum 5,, Benoît Girard 1,, Mehdi Khamassi

More information

Computational Psychiatry: Tasmia Rahman Tumpa

Computational Psychiatry: Tasmia Rahman Tumpa Computational Psychiatry: Tasmia Rahman Tumpa Computational Psychiatry: Existing psychiatric diagnostic system and treatments for mental or psychiatric disorder lacks biological foundation [1]. Complexity

More information

Modeling the sensory roles of noradrenaline in action selection

Modeling the sensory roles of noradrenaline in action selection Modeling the sensory roles of noradrenaline in action selection Maxime Carrere, Frédéric Alexandre To cite this version: Maxime Carrere, Frédéric Alexandre. Modeling the sensory roles of noradrenaline

More information

Behavioral considerations suggest an average reward TD model of the dopamine system

Behavioral considerations suggest an average reward TD model of the dopamine system Neurocomputing 32}33 (2000) 679}684 Behavioral considerations suggest an average reward TD model of the dopamine system Nathaniel D. Daw*, David S. Touretzky Computer Science Department & Center for the

More information

Exploring the Functional Significance of Dendritic Inhibition In Cortical Pyramidal Cells

Exploring the Functional Significance of Dendritic Inhibition In Cortical Pyramidal Cells Neurocomputing, 5-5:389 95, 003. Exploring the Functional Significance of Dendritic Inhibition In Cortical Pyramidal Cells M. W. Spratling and M. H. Johnson Centre for Brain and Cognitive Development,

More information

Neurobiological Foundations of Reward and Risk

Neurobiological Foundations of Reward and Risk Neurobiological Foundations of Reward and Risk... and corresponding risk prediction errors Peter Bossaerts 1 Contents 1. Reward Encoding And The Dopaminergic System 2. Reward Prediction Errors And TD (Temporal

More information

Cost-Sensitive Learning for Biological Motion

Cost-Sensitive Learning for Biological Motion Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud October 5, 2010 1 / 42 Table of contents The problem of movement time Statement of problem A discounted reward

More information

A behavioral investigation of the algorithms underlying reinforcement learning in humans

A behavioral investigation of the algorithms underlying reinforcement learning in humans A behavioral investigation of the algorithms underlying reinforcement learning in humans Ana Catarina dos Santos Farinha Under supervision of Tiago Vaz Maia Instituto Superior Técnico Instituto de Medicina

More information

A quick overview of LOCEN-ISTC-CNR theoretical analyses and system-level computational models of brain: from motivations to actions

A quick overview of LOCEN-ISTC-CNR theoretical analyses and system-level computational models of brain: from motivations to actions A quick overview of LOCEN-ISTC-CNR theoretical analyses and system-level computational models of brain: from motivations to actions Gianluca Baldassarre Laboratory of Computational Neuroscience, Institute

More information

Emotion Explained. Edmund T. Rolls

Emotion Explained. Edmund T. Rolls Emotion Explained Edmund T. Rolls Professor of Experimental Psychology, University of Oxford and Fellow and Tutor in Psychology, Corpus Christi College, Oxford OXPORD UNIVERSITY PRESS Contents 1 Introduction:

More information

PVLV: The Primary Value and Learned Value Pavlovian Learning Algorithm

PVLV: The Primary Value and Learned Value Pavlovian Learning Algorithm Behavioral Neuroscience Copyright 2007 by the American Psychological Association 2007, Vol. 121, No. 1, 31 49 0735-7044/07/$12.00 DOI: 10.1037/0735-7044.121.1.31 PVLV: The Primary Value and Learned Value

More information

Computational Explorations in Cognitive Neuroscience Chapter 7: Large-Scale Brain Area Functional Organization

Computational Explorations in Cognitive Neuroscience Chapter 7: Large-Scale Brain Area Functional Organization Computational Explorations in Cognitive Neuroscience Chapter 7: Large-Scale Brain Area Functional Organization 1 7.1 Overview This chapter aims to provide a framework for modeling cognitive phenomena based

More information

Do Reinforcement Learning Models Explain Neural Learning?

Do Reinforcement Learning Models Explain Neural Learning? Do Reinforcement Learning Models Explain Neural Learning? Svenja Stark Fachbereich 20 - Informatik TU Darmstadt svenja.stark@stud.tu-darmstadt.de Abstract Because the functionality of our brains is still

More information

Timing and partial observability in the dopamine system

Timing and partial observability in the dopamine system In Advances in Neural Information Processing Systems 5. MIT Press, Cambridge, MA, 23. (In Press) Timing and partial observability in the dopamine system Nathaniel D. Daw,3, Aaron C. Courville 2,3, and

More information

ERA: Architectures for Inference

ERA: Architectures for Inference ERA: Architectures for Inference Dan Hammerstrom Electrical And Computer Engineering 7/28/09 1 Intelligent Computing In spite of the transistor bounty of Moore s law, there is a large class of problems

More information

How Do Computational Models of the Role of Dopamine as a Reward Prediction Error Map on to Current Dopamine Theories of Schizophrenia?

How Do Computational Models of the Role of Dopamine as a Reward Prediction Error Map on to Current Dopamine Theories of Schizophrenia? How Do Computational Models of the Role of Dopamine as a Reward Prediction Error Map on to Current Dopamine Theories of Schizophrenia? Angela J. Thurnham* (a.j.thurnham@herts.ac.uk), D. John Done** (d.j.done@herts.ac.uk),

More information

Evaluating the Effect of Spiking Network Parameters on Polychronization

Evaluating the Effect of Spiking Network Parameters on Polychronization Evaluating the Effect of Spiking Network Parameters on Polychronization Panagiotis Ioannou, Matthew Casey and André Grüning Department of Computing, University of Surrey, Guildford, Surrey, GU2 7XH, UK

More information

dacc and the adaptive regulation of reinforcement learning parameters: neurophysiology, computational model and some robotic implementations

dacc and the adaptive regulation of reinforcement learning parameters: neurophysiology, computational model and some robotic implementations dacc and the adaptive regulation of reinforcement learning parameters: neurophysiology, computational model and some robotic implementations Mehdi Khamassi (CNRS & UPMC, Paris) Symposim S23 chaired by

More information

Computational cognitive neuroscience: 8. Motor Control and Reinforcement Learning

Computational cognitive neuroscience: 8. Motor Control and Reinforcement Learning 1 Computational cognitive neuroscience: 8. Motor Control and Reinforcement Learning Lubica Beňušková Centre for Cognitive Science, FMFI Comenius University in Bratislava 2 Sensory-motor loop The essence

More information

The Frontal Lobes. Anatomy of the Frontal Lobes. Anatomy of the Frontal Lobes 3/2/2011. Portrait: Losing Frontal-Lobe Functions. Readings: KW Ch.

The Frontal Lobes. Anatomy of the Frontal Lobes. Anatomy of the Frontal Lobes 3/2/2011. Portrait: Losing Frontal-Lobe Functions. Readings: KW Ch. The Frontal Lobes Readings: KW Ch. 16 Portrait: Losing Frontal-Lobe Functions E.L. Highly organized college professor Became disorganized, showed little emotion, and began to miss deadlines Scores on intelligence

More information

5th Mini-Symposium on Cognition, Decision-making and Social Function: In Memory of Kang Cheng

5th Mini-Symposium on Cognition, Decision-making and Social Function: In Memory of Kang Cheng 5th Mini-Symposium on Cognition, Decision-making and Social Function: In Memory of Kang Cheng 13:30-13:35 Opening 13:30 17:30 13:35-14:00 Metacognition in Value-based Decision-making Dr. Xiaohong Wan (Beijing

More information

THE BRAIN HABIT BRIDGING THE CONSCIOUS AND UNCONSCIOUS MIND

THE BRAIN HABIT BRIDGING THE CONSCIOUS AND UNCONSCIOUS MIND THE BRAIN HABIT BRIDGING THE CONSCIOUS AND UNCONSCIOUS MIND Mary ET Boyle, Ph. D. Department of Cognitive Science UCSD How did I get here? What did I do? Start driving home after work Aware when you left

More information

A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior

A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior Michael E. Hasselmo Abstract & Many behavioral tasks require goal-directed actions to obtain delayed reward. The prefrontal cortex appears

More information

DISCUSSION : Biological Memory Models

DISCUSSION : Biological Memory Models "1993 Elsevier Science Publishers B.V. All rights reserved. Memory concepts - 1993. Basic and clinical aspecrs P. Andcrscn. 0. Hvalby. 0. Paulscn and B. HBkfelr. cds. DISCUSSION : Biological Memory Models

More information

Recognition of English Characters Using Spiking Neural Networks

Recognition of English Characters Using Spiking Neural Networks Recognition of English Characters Using Spiking Neural Networks Amjad J. Humaidi #1, Thaer M. Kadhim *2 Control and System Engineering, University of Technology, Iraq, Baghdad 1 601116@uotechnology.edu.iq

More information

Combining Configural and TD Learning on a Robot

Combining Configural and TD Learning on a Robot Proceedings of the Second International Conference on Development and Learning, Cambridge, MA, June 2 5, 22. Combining Configural and TD Learning on a Robot David S. Touretzky, Nathaniel D. Daw, and Ethan

More information

Dopamine, prediction error and associative learning: A model-based account

Dopamine, prediction error and associative learning: A model-based account Network: Computation in Neural Systems March 2006; 17: 61 84 Dopamine, prediction error and associative learning: A model-based account ANDREW SMITH 1, MING LI 2, SUE BECKER 1,&SHITIJ KAPUR 2 1 Department

More information

Extending the Computational Abilities of the Procedural Learning Mechanism in ACT-R

Extending the Computational Abilities of the Procedural Learning Mechanism in ACT-R Extending the Computational Abilities of the Procedural Learning Mechanism in ACT-R Wai-Tat Fu (wfu@cmu.edu) John R. Anderson (ja+@cmu.edu) Department of Psychology, Carnegie Mellon University Pittsburgh,

More information

Observational Learning Based on Models of Overlapping Pathways

Observational Learning Based on Models of Overlapping Pathways Observational Learning Based on Models of Overlapping Pathways Emmanouil Hourdakis and Panos Trahanias Institute of Computer Science, Foundation for Research and Technology Hellas (FORTH) Science and Technology

More information

Hebbian Plasticity for Improving Perceptual Decisions

Hebbian Plasticity for Improving Perceptual Decisions Hebbian Plasticity for Improving Perceptual Decisions Tsung-Ren Huang Department of Psychology, National Taiwan University trhuang@ntu.edu.tw Abstract Shibata et al. reported that humans could learn to

More information

A Neurocomputational Model of Dopamine and Prefrontal Striatal Interactions during Multicue Category Learning by Parkinson Patients

A Neurocomputational Model of Dopamine and Prefrontal Striatal Interactions during Multicue Category Learning by Parkinson Patients A Neurocomputational Model of Dopamine and Prefrontal Striatal Interactions during Multicue Category Learning by Parkinson Patients Ahmed A. Moustafa and Mark A. Gluck Abstract Most existing models of

More information

Is model fitting necessary for model-based fmri?

Is model fitting necessary for model-based fmri? Is model fitting necessary for model-based fmri? Robert C. Wilson Princeton Neuroscience Institute Princeton University Princeton, NJ 85 rcw@princeton.edu Yael Niv Princeton Neuroscience Institute Princeton

More information

Lateral Inhibition Explains Savings in Conditioning and Extinction

Lateral Inhibition Explains Savings in Conditioning and Extinction Lateral Inhibition Explains Savings in Conditioning and Extinction Ashish Gupta & David C. Noelle ({ashish.gupta, david.noelle}@vanderbilt.edu) Department of Electrical Engineering and Computer Science

More information

A computational model of integration between reinforcement learning and task monitoring in the prefrontal cortex

A computational model of integration between reinforcement learning and task monitoring in the prefrontal cortex A computational model of integration between reinforcement learning and task monitoring in the prefrontal cortex Mehdi Khamassi, Rene Quilodran, Pierre Enel, Emmanuel Procyk, and Peter F. Dominey INSERM

More information

Anatomy of the basal ganglia. Dana Cohen Gonda Brain Research Center, room 410

Anatomy of the basal ganglia. Dana Cohen Gonda Brain Research Center, room 410 Anatomy of the basal ganglia Dana Cohen Gonda Brain Research Center, room 410 danacoh@gmail.com The basal ganglia The nuclei form a small minority of the brain s neuronal population. Little is known about

More information

Joseph T. McGuire. Department of Psychological & Brain Sciences Boston University 677 Beacon St. Boston, MA

Joseph T. McGuire. Department of Psychological & Brain Sciences Boston University 677 Beacon St. Boston, MA Employment Joseph T. McGuire Department of Psychological & Brain Sciences Boston University 677 Beacon St. Boston, MA 02215 jtmcg@bu.edu Boston University, 2015 present Assistant Professor, Department

More information

Shadowing and Blocking as Learning Interference Models

Shadowing and Blocking as Learning Interference Models Shadowing and Blocking as Learning Interference Models Espoir Kyubwa Dilip Sunder Raj Department of Bioengineering Department of Neuroscience University of California San Diego University of California

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

On Intelligence. Contents. Outline. On Intelligence

On Intelligence. Contents. Outline. On Intelligence On Intelligence From Wikipedia, the free encyclopedia On Intelligence: How a New Understanding of the Brain will Lead to the Creation of Truly Intelligent Machines is a book by Palm Pilot-inventor Jeff

More information

How rational are your decisions? Neuroeconomics

How rational are your decisions? Neuroeconomics How rational are your decisions? Neuroeconomics Hecke CNS Seminar WS 2006/07 Motivation Motivation Ferdinand Porsche "Wir wollen Autos bauen, die keiner braucht aber jeder haben will." Outline 1 Introduction

More information

Reward Systems: Human

Reward Systems: Human Reward Systems: Human 345 Reward Systems: Human M R Delgado, Rutgers University, Newark, NJ, USA ã 2009 Elsevier Ltd. All rights reserved. Introduction Rewards can be broadly defined as stimuli of positive

More information

Realization of Visual Representation Task on a Humanoid Robot

Realization of Visual Representation Task on a Humanoid Robot Istanbul Technical University, Robot Intelligence Course Realization of Visual Representation Task on a Humanoid Robot Emeç Erçelik May 31, 2016 1 Introduction It is thought that human brain uses a distributed

More information

Cell Responses in V4 Sparse Distributed Representation

Cell Responses in V4 Sparse Distributed Representation Part 4B: Real Neurons Functions of Layers Input layer 4 from sensation or other areas 3. Neocortical Dynamics Hidden layers 2 & 3 Output layers 5 & 6 to motor systems or other areas 1 2 Hierarchical Categorical

More information

Fundamentals of Computational Neuroscience 2e

Fundamentals of Computational Neuroscience 2e Fundamentals of Computational Neuroscience 2e Thomas Trappenberg January 7, 2009 Chapter 1: Introduction What is Computational Neuroscience? What is Computational Neuroscience? Computational Neuroscience

More information

Computational & Systems Neuroscience Symposium

Computational & Systems Neuroscience Symposium Keynote Speaker: Mikhail Rabinovich Biocircuits Institute University of California, San Diego Sequential information coding in the brain: binding, chunking and episodic memory dynamics Sequential information

More information

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience Introduction to Computational Neuroscience Lecture 11: Attention & Decision making Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis

More information

Network Dynamics of Basal Forebrain and Parietal Cortex Neurons. David Tingley 6/15/2012

Network Dynamics of Basal Forebrain and Parietal Cortex Neurons. David Tingley 6/15/2012 Network Dynamics of Basal Forebrain and Parietal Cortex Neurons David Tingley 6/15/2012 Abstract The current study examined the firing properties of basal forebrain and parietal cortex neurons across multiple

More information

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Tapani Raiko and Harri Valpola School of Science and Technology Aalto University (formerly Helsinki University of

More information

Decision neuroscience seeks neural models for how we identify, evaluate and choose

Decision neuroscience seeks neural models for how we identify, evaluate and choose VmPFC function: The value proposition Lesley K Fellows and Scott A Huettel Decision neuroscience seeks neural models for how we identify, evaluate and choose options, goals, and actions. These processes

More information

Intelligent Adaption Across Space

Intelligent Adaption Across Space Intelligent Adaption Across Space Honors Thesis Paper by Garlen Yu Systems Neuroscience Department of Cognitive Science University of California, San Diego La Jolla, CA 92092 Advisor: Douglas Nitz Abstract

More information

Brain Imaging studies in substance abuse. Jody Tanabe, MD University of Colorado Denver

Brain Imaging studies in substance abuse. Jody Tanabe, MD University of Colorado Denver Brain Imaging studies in substance abuse Jody Tanabe, MD University of Colorado Denver NRSC January 28, 2010 Costs: Health, Crime, Productivity Costs in billions of dollars (2002) $400 $350 $400B legal

More information

Dual Reward Prediction Components Yield Pavlovian Sign- and Goal-Tracking

Dual Reward Prediction Components Yield Pavlovian Sign- and Goal-Tracking Dual Reward Prediction Components Yield Pavlovian Sign- and Goal-Tracking Sivaramakrishnan Kaveri 1,2 *, Hiroyuki Nakahara 1 1 Lab for Integrated Theoretical Neuroscience, RIKEN BSI, Wako, Japan, 2 Dept.

More information

Choosing the Greater of Two Goods: Neural Currencies for Valuation and Decision Making

Choosing the Greater of Two Goods: Neural Currencies for Valuation and Decision Making Choosing the Greater of Two Goods: Neural Currencies for Valuation and Decision Making Leo P. Surgre, Gres S. Corrado and William T. Newsome Presenter: He Crane Huang 04/20/2010 Outline Studies on neural

More information

Learning and Adaptive Behavior, Part II

Learning and Adaptive Behavior, Part II Learning and Adaptive Behavior, Part II April 12, 2007 The man who sets out to carry a cat by its tail learns something that will always be useful and which will never grow dim or doubtful. -- Mark Twain

More information

Why do we have a hippocampus? Short-term memory and consolidation

Why do we have a hippocampus? Short-term memory and consolidation Why do we have a hippocampus? Short-term memory and consolidation So far we have talked about the hippocampus and: -coding of spatial locations in rats -declarative (explicit) memory -experimental evidence

More information

Neural Correlates of Human Cognitive Function:

Neural Correlates of Human Cognitive Function: Neural Correlates of Human Cognitive Function: A Comparison of Electrophysiological and Other Neuroimaging Approaches Leun J. Otten Institute of Cognitive Neuroscience & Department of Psychology University

More information

Learning of sequential movements by neural network model with dopamine-like reinforcement signal

Learning of sequential movements by neural network model with dopamine-like reinforcement signal Exp Brain Res (1998) 121:350±354 Springer-Verlag 1998 RESEARCH NOTE Roland E. Suri Wolfram Schultz Learning of sequential movements by neural network model with dopamine-like reinforcement signal Received:

More information

Object recognition and hierarchical computation

Object recognition and hierarchical computation Object recognition and hierarchical computation Challenges in object recognition. Fukushima s Neocognitron View-based representations of objects Poggio s HMAX Forward and Feedback in visual hierarchy Hierarchical

More information

Representation of negative motivational value in the primate

Representation of negative motivational value in the primate Representation of negative motivational value in the primate lateral habenula Masayuki Matsumoto & Okihide Hikosaka Supplementary Figure 1 Anticipatory licking and blinking. (a, b) The average normalized

More information

The computational neurobiology of learning and reward Nathaniel D Daw 1 and Kenji Doya 2,3

The computational neurobiology of learning and reward Nathaniel D Daw 1 and Kenji Doya 2,3 The computational neurobiology of learning and reward Nathaniel D Daw 1 and Kenji Doya 2,3 Following the suggestion that midbrain dopaminergic neurons encode a signal, known as a reward prediction error,

More information

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single

More information

Modeling of Hippocampal Behavior

Modeling of Hippocampal Behavior Modeling of Hippocampal Behavior Diana Ponce-Morado, Venmathi Gunasekaran and Varsha Vijayan Abstract The hippocampus is identified as an important structure in the cerebral cortex of mammals for forming

More information

The Rescorla Wagner Learning Model (and one of its descendants) Computational Models of Neural Systems Lecture 5.1

The Rescorla Wagner Learning Model (and one of its descendants) Computational Models of Neural Systems Lecture 5.1 The Rescorla Wagner Learning Model (and one of its descendants) Lecture 5.1 David S. Touretzky Based on notes by Lisa M. Saksida November, 2015 Outline Classical and instrumental conditioning The Rescorla

More information

Computational Psychiatry and Neurology. Michael J. Frank Laboratory for Neural Computation and Cognition Brown University

Computational Psychiatry and Neurology. Michael J. Frank Laboratory for Neural Computation and Cognition Brown University Computational Psychiatry and Neurology Michael J. Frank Laboratory for Neural Computation and Cognition Brown University Computation as a link between brain, mind, and pathology Computation as a link

More information

This article was originally published in a journal published by Elsevier, and the attached copy is provided by Elsevier for the author s benefit and for the benefit of the author s institution, for non-commercial

More information

Deep Reinforcement Learning as Foundation for Artificial General Intelligence

Deep Reinforcement Learning as Foundation for Artificial General Intelligence Chapter 6 Deep Reinforcement Learning as Foundation for Artificial General Intelligence Itamar Arel Machine Intelligence Lab, Department of Electrical Engineering and Computer Science, University of Tennessee

More information

Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters

Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters CHAPTER 22 Mehdi Khamassi*,{,{,},1, Pierre Enel*,{, Peter Ford Dominey*,{, Emmanuel Procyk*,{ INSERM U846, Stem

More information

Rolls,E.T. (2016) Cerebral Cortex: Principles of Operation. Oxford University Press.

Rolls,E.T. (2016) Cerebral Cortex: Principles of Operation. Oxford University Press. Digital Signal Processing and the Brain Is the brain a digital signal processor? Digital vs continuous signals Digital signals involve streams of binary encoded numbers The brain uses digital, all or none,

More information

Synfire chains with conductance-based neurons: internal timing and coordination with timed input

Synfire chains with conductance-based neurons: internal timing and coordination with timed input Neurocomputing 5 (5) 9 5 www.elsevier.com/locate/neucom Synfire chains with conductance-based neurons: internal timing and coordination with timed input Friedrich T. Sommer a,, Thomas Wennekers b a Redwood

More information

Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling

Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling Cerebral Cortex October 27;7:2443--2452 doi:.93/cercor/bhl52 Advance Access publication January 3, 27 Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling Eugene M. Izhikevich

More information

CSE511 Brain & Memory Modeling Lect 22,24,25: Memory Systems

CSE511 Brain & Memory Modeling Lect 22,24,25: Memory Systems CSE511 Brain & Memory Modeling Lect 22,24,25: Memory Systems Compare Chap 31 of Purves et al., 5e Chap 24 of Bear et al., 3e Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse511

More information

Consciousness as representation formation from a neural Darwinian perspective *

Consciousness as representation formation from a neural Darwinian perspective * Consciousness as representation formation from a neural Darwinian perspective * Anna Kocsis, mag.phil. Institute of Philosophy Zagreb, Croatia Vjeran Kerić, mag.phil. Department of Psychology and Cognitive

More information

Reinforcement Learning. With help from

Reinforcement Learning. With help from Reinforcement Learning With help from A Taxonomoy of Learning L. of representations, models, behaviors, facts, Unsupervised L. Self-supervised L. Reinforcement L. Imitation L. Instruction-based L. Supervised

More information

Becoming symbol-minded. Judy S. DeLoache

Becoming symbol-minded. Judy S. DeLoache Symbols and Rules Becoming symbol-minded Judy S. DeLoache Symbols def: A symbol is something that someone intends to represent something other than istelf. This is among the looser definitions for symbols.

More information

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience Introduction to Computational Neuroscience Lecture 7: Network models Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single neuron

More information

Visual Context Dan O Shea Prof. Fei Fei Li, COS 598B

Visual Context Dan O Shea Prof. Fei Fei Li, COS 598B Visual Context Dan O Shea Prof. Fei Fei Li, COS 598B Cortical Analysis of Visual Context Moshe Bar, Elissa Aminoff. 2003. Neuron, Volume 38, Issue 2, Pages 347 358. Visual objects in context Moshe Bar.

More information

The individual animals, the basic design of the experiments and the electrophysiological

The individual animals, the basic design of the experiments and the electrophysiological SUPPORTING ONLINE MATERIAL Material and Methods The individual animals, the basic design of the experiments and the electrophysiological techniques for extracellularly recording from dopamine neurons were

More information

Computational Approaches in Cognitive Neuroscience

Computational Approaches in Cognitive Neuroscience Computational Approaches in Cognitive Neuroscience Jeff Krichmar Department of Cognitive Sciences Department of Computer Science University of California, Irvine, USA Slides for this course can be found

More information

Multiple Forms of Value Learning and the Function of Dopamine. 1. Department of Psychology and the Brain Research Institute, UCLA

Multiple Forms of Value Learning and the Function of Dopamine. 1. Department of Psychology and the Brain Research Institute, UCLA Balleine, Daw & O Doherty 1 Multiple Forms of Value Learning and the Function of Dopamine Bernard W. Balleine 1, Nathaniel D. Daw 2 and John O Doherty 3 1. Department of Psychology and the Brain Research

More information

The Influence of the Initial Associative Strength on the Rescorla-Wagner Predictions: Relative Validity

The Influence of the Initial Associative Strength on the Rescorla-Wagner Predictions: Relative Validity Methods of Psychological Research Online 4, Vol. 9, No. Internet: http://www.mpr-online.de Fachbereich Psychologie 4 Universität Koblenz-Landau The Influence of the Initial Associative Strength on the

More information

THE BRAIN HABIT BRIDGING THE CONSCIOUS AND UNCONSCIOUS MIND. Mary ET Boyle, Ph. D. Department of Cognitive Science UCSD

THE BRAIN HABIT BRIDGING THE CONSCIOUS AND UNCONSCIOUS MIND. Mary ET Boyle, Ph. D. Department of Cognitive Science UCSD THE BRAIN HABIT BRIDGING THE CONSCIOUS AND UNCONSCIOUS MIND Mary ET Boyle, Ph. D. Department of Cognitive Science UCSD Linking thought and movement simultaneously! Forebrain Basal ganglia Midbrain and

More information

Time Experiencing by Robotic Agents

Time Experiencing by Robotic Agents Time Experiencing by Robotic Agents Michail Maniadakis 1 and Marc Wittmann 2 and Panos Trahanias 1 1- Foundation for Research and Technology - Hellas, ICS, Greece 2- Institute for Frontier Areas of Psychology

More information

Systems Neuroscience November 29, Memory

Systems Neuroscience November 29, Memory Systems Neuroscience November 29, 2016 Memory Gabriela Michel http: www.ini.unizh.ch/~kiper/system_neurosci.html Forms of memory Different types of learning & memory rely on different brain structures

More information

Distinct valuation subsystems in the human brain for effort and delay

Distinct valuation subsystems in the human brain for effort and delay Supplemental material for Distinct valuation subsystems in the human brain for effort and delay Charlotte Prévost, Mathias Pessiglione, Elise Météreau, Marie-Laure Cléry-Melin and Jean-Claude Dreher This

More information

The Integration of Features in Visual Awareness : The Binding Problem. By Andrew Laguna, S.J.

The Integration of Features in Visual Awareness : The Binding Problem. By Andrew Laguna, S.J. The Integration of Features in Visual Awareness : The Binding Problem By Andrew Laguna, S.J. Outline I. Introduction II. The Visual System III. What is the Binding Problem? IV. Possible Theoretical Solutions

More information

Author's personal copy

Author's personal copy Provided for non-commercial research and educational use only. Not for reproduction, distribution or commercial use. This chapter was originally published in the book Computational Psychiatry. The copy

More information

Attentive Stereoscopic Object Recognition

Attentive Stereoscopic Object Recognition Attentive Stereoscopic Object Recognition Frederik Beuth, Jan Wiltschut, and Fred H. Hamker Chemnitz University of Technology, Strasse der Nationen 62, 09107 Chemnitz, Germany frederik.beuth@cs.tu-chemnitz.de,wiltschj@uni-muenster.de,

More information

Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network

Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network The Journal of Neuroscience, June 29, 2005 25(26):6235 6242 6235 Behavioral/Systems/Cognitive Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in

More information

Chapter 5. Summary and Conclusions! 131

Chapter 5. Summary and Conclusions! 131 ! Chapter 5 Summary and Conclusions! 131 Chapter 5!!!! Summary of the main findings The present thesis investigated the sensory representation of natural sounds in the human auditory cortex. Specifically,

More information