Parameter Invariability in the TD Model. with Complete Serial Components. Jordan Marks. Middlesex House. 24 May 1999

Size: px

Start display at page:

Download "Parameter Invariability in the TD Model. with Complete Serial Components. Jordan Marks. Middlesex House. 24 May 1999"

Rudolf Scott
5 years ago
Views:

1 Parameter Invariability in the TD Model with Complete Serial Components Jordan Marks Middlesex House 24 May 1999 This work was supported in part by NIMH grant MH57893, John W. Moore, PI 1999 Jordan S. Marks

2 Introduction The purpose of this study is to investigate the stability of the parameters of the Time Derivative (TD) model with complete serial components (Sutton & Barto, 1990). The TD model with complete serial components is a computational learning model that is able to represent the eye-blink response seen in the rabbit eye-blink preparation. The TD model has been related to various neurological systems within the cerebellum (Rosenfield & Moore, 1995). Given these relations, there is the interesting question as to whether the parameters remain constant for a given experimental protocol. This study will investigate the ability of this model to make accurate waveform predictions for a given paradigm with invariant parameters over the course of the training period. Computational models of learning are used to predict the timing and magnitude of responses organisms will display for a given stimulus during, and after, classical conditioning. The majority of current models are descended from the Rescorla-Wagner (RW) model (Rescorla and Wagner, 1972), which was the first modern computational theory to be able to accurately predict such complex conditioning paradigms as Kamin blocking and conditioned inhibition (Moore and Choi, 1997). Modern computational models, including the RW model, are expressed as an equation that relates the associative connection (V) between one or more conditioned stimuli (CS) and the conditioned response(s) (CR). The RW model is a trial-level model, in that the model is able to predict changes in associative strength only from one trial to the next, and not within a trial itself. 1

3 The formula for the RW model is where V i (t) = β[λ(t) Y(t-1)] x αx i (t) Y(t) = V j (t)x j (t) j The subscript i refers to ith CS. That is, in a paradigm with more than one CS, such as one with a tone and a light used as separate CSs, one CS (e.g. the light) would be assigned i=1, the other (e.g. the tone) would be assigned i=2. The subscript j indexes all CSs that are active during trial t. λ(t) is the strength of the US at trial t, and X i (t) indicates the presence or absence of CS i, where X i (t)=1 if the CS i is present on trial t, and X i (t)=0 if CS i is absent on trial t. α and β are rate parameters (0 < α,β 1). While the RW model allows for modeling of some classical conditioning paradigms, it, and other trial-level models, also have a number of limitations. Among these limitations are these models inability to generate CS-US interval functions or higher-order conditioning (Sutton and Barto, 1990). In response to these, and other limitations, real-time computational models have been developed. Unlike trial-level models, real-time models are able to take into account the duration and timing of the CS(s) and unconditioned stimulus (US), and allow the associative strength to be modeled throughout the length of the trial, instead of treating each trial as a discrete unit. While real-time models theoretically treat 2

4 time as a continuum, in practice for computational reasons, real-time models must divide the trial up into fine, discrete steps. Sutton and Barto s (SB) model was the first real-time model to generate CS-US interval functions and higher-order conditioning (that is, when a novel stimulus is paired with a previously trained CS) (Sutton and Barto, 1981; Moore and Choi, 1997). Sutton and Barto (1990) describe the SB model, and a related class of models, as Y ( Y dot ) theories of learning. Y theories adjust associative values of CSs according to the first time-derivative of the system s response (Y(t)). If the response at time t is greater than that of the preceding step (t-1), connections of active CSs are strengthened. The SB model is described by the formula where V i = β Y x α i X i Y = Y(t) Y(t-1) Y(t) = V j (t)x j (t) + λ(t) j and X i (t + 1) = X i + δ[x i (t) X i (t)] i refers to a discrete time step that the trial is broken down into for computational purposes (see above). As in the RW model, V j refers to each CSs active during trial t, and X j is 1 if CS j was active during trial t. X ( X bar ) is the eligibility of 3

5 the ith discrete time period for modification at time t. It functions as a slowly decaying function, which will even allow for CSs that are no longer actively contributing to output to have their connection weights modified. Similar to the RW model, λ(t) is the strength of the US at time t, and α and β are rate parameters. δ is a parameter that controls how long a CS remains eligible (0 < α,β,δ 1). While the SB model provides a format that is useful in describing a realtime model, it does not generate realistic looking CRs. One of its problems is that it predicts a response amplitude that is too large for trials in which a US is presented. It also does not accurately predict the delayed CR onset in respect to CS onsets. Finally, it does not predict that the offsets of CSs, as well as their onsets, can elicit CRs. To word that another way, this model does not allow the offset of a CS to act as a timing signal. The VET model (Values based on expectations about timing) (Moore & Desmond, 1992) attempted to overcome the limitations of the SB model. The VET model assigns a cascade of sequential time-tagged input units to each CSs onset and offset. In effect, this divides each CS up into serial components, with each of these input units being a component of the larger CS (Sutton and Barto, 1990). This timing structure allows the VET model to develop much more realistic CRs than the SB model does, in certain paradigms. However, the VET 4

6 model fails in some ways that the SB model does not, most notably in its inability to model higher-order conditioning. Sutton and Barto (1990) developed the temporal-difference (TD) model in an attempt to address the shortcomings of the SB and VET models. The TD model is governed by the equation As in the SB model, V i (t) = β[λ(t) γy(t) Y(t-1)] x α X i (t) X i (t + 1) = X i + δ[x i (t) X i (t)] and, differing from the SB model, the component from the US, λ, is represented in the formula separately from the Y component, which is given by the formula Y(t) = V j (t-1)x j (t) j Again, j refers to all CSs active at time t. α, β and δ are parameters identical to the SB model (0 < α,β,δ 1). A new parameter is added to the TD model, however. γ is a discount parameter (0 < γ 1), used because in computing Y, V(t-1) is used (V(t) is not known until the end of the time step, and therefore is unknown at the time of the computation). γ is used to attempt to take into account the uncertainty introduced by using last time step s V value. Sutton and Barto, in proposing this model, intended that the time between the onset of a CS and the onset of the US should be split into serial components. In doing so, they basically used the cascade of sequential time-tagged input units that the VET model 5

7 developed. If one goes farther and divides the entire trial period (instead of just the time during the CS(s) and US(s)) into the time-tagged units of the VET model, or serial components, and one uses both CS onsets and offsets to initiate cascades, the TD model is able to describe CR timing and topography in trace conditioning as well as complex paradigms, including multiple CSs (Moore and Choi, 1997). The TD model as it is used in this investigation uses this complete serial component design, as well as onset and offset cascades. This model can be used to predict numerous waveforms similar to that seen in eyeblink conditioning by changing the values of the parameters. Figures 1 and 2 shows a family of CR waveforms with different values of δ and γ 1. In figure 1, γ is varied systematically (values of.5,.75,.9,.99, and.999) for four different fixed values of δ (.5,.75,.9 and.99). Figure 2 shows the effects of varying δ (values of.25,.5,.75,.9 and.99) while holding all other parameters fixed. The figures shows that waveform topography is heavily influenced by both δ and γ. Lowering the value of γ lowers the peak CR amplitude as well as lowers the rate of increase of the response, without affecting the peak time. δ has an effect on the concavity of the CR curve. 1 This, and all CR waveforms depicted in this thesis, as well as the TD engine of the waveform matching program were lightly adapted from code originally written by June-Seek Choi. Without his programming assistance, this project would have been undoable. 6

8 Y(t) gamma=.50 gamma=.75 gamma=.90 gamma=.99 gamma=.999 delta = US Y(t) gamma=.50 gamma=.75 gamma=.90 gamma=.99 gamma=.999 delta = US Y(t) gamma=.50 gamma=.75 gamma=.90 gamma=.99 gamma=.999 delta = US Y(t) gamma=.50 gamma=.75 gamma=.90 gamma=.99 gamma=.999 delta = US 7

9 Figure 1 (previous page). CR waveforms produced by the TD model. α = 0.1 β = 1.0 λ = 1.0 δ and γ as indicated. Y(t) delta =.25 delta =.50 delta =.75 delta =.90 delta =.99 gamma = US Figure 2. CR waveforms produced by the TD model. α = 0.1 β = 1.0 λ = 1.0 δ and γ as indicated. Figure 1. CR 8

10 α and β are mathematically inseparable in the TD formula, and will hereafter be referred to as a single variable, αβ; the product of the parameter values of α and β. αβ affects the rate at which the waveforms develop, decreasing the trials needed as αβ increases. λ increases the CR s peak height as it increases. By varying these parameters and instructing the simulator to emulate a classical conditioning paradigm, the TD model is able to predict realistic-looking waveforms, similar to what is seen in behavioral studies. In this investigation, the parameters were varied systematically and the model was subjected to a set of training conditions designed to match an ongoing behavioral study in this lab. For each parameter combination, a least mean square difference was determined between the waveform predicted and the waveform observed. This difference was determined from the onset of the CS until the onset of the US (or time of onset, on probe trials.) From the time of US onset, the TD model does not predict an accurate eyeblink waveform, as can be seen in figures 1 and 2. This is due to the fact that the model is explicitly predicting connection weights, and not behavior. During the rising portion of the waveform, these are essentially identical. Once the US engages, however, the connection weights drop to 0 precipitously, while the eyelid position falls off more slowly. For that reason, differences were not measured after the time of US onset. Figures 1 and 2 demonstrate this predicted rapid drop off, as well as the more behaviorally similar rising phase. A best-fit parameter set was 9

11 determined for each of five five-session periods. The exact training paradigm used is described in the Methods section. 10

12 Method Subjects, apparatus and stimuli. The subjects used in this investigation were four naïve albino rabbits obtained from a local supplier, maintained on an adlib food and water diet, 24-hour illumination, and housed individually. The rabbits were restrained in Plexiglas holders placed within a four-drawer filing cabinet specially modified to be able to present tone and light CSs, as well as electrical shocks as USs. Fans within the cabinet provided both fresh air to the rabbits as well as white noise to mask extraneous noises from within the lab. Prior to any training, the rabbits were acclimated to the training chamber by being placed within the chamber, restrained within the Plexiglas holder, for two sessions without any stimulus being presented. Eyelid location was monitored using a micro torque potentiometer connected via a stiff wire to a suture placed previously on the right eyelid and was recorded every 4 ms during the course of a trial. The US was periorbital electrical stimulation of 50 ms duration, delivered by two stainless steel wound clips attached to the skin dorsal and caudal to the right eye. These clips were attached prior to each training session, and removed at the end, before returning the rabbits to their cages. CS1 was two white lights presented simultaneously for 800 ms. CS2 was a 90 db tone presented from a single speaker for 800 ms. In all cases, inter-trial intervals (ITIs) were seconds. All stimuli, timing, and recording was controlled by an IBM-type computer running MS-DOS 6.x and software written by and for this lab in QBasic. 11

13 Training paradigm. The four rabbits were divided into two groups of two rabbits each. These will be referred to as the experimental group (rabbits A & C) and the control group (rabbits B & D). The training was done in two stages. In stage one, the experimental group was presented with trials consisting of CS1 (800 ms light) paired with a 50 ms eyeshock beginning at 300 ms after the onset of CS1. This trial type is shown in figure 3a. Each session consisted of 60 trials of this type. The experimental group was given ten such sessions, for a total of 600 trials. During this time, the control group was placed in the Plexiglas holder in the filing cabinet, but presented with no trials of any kind ( sit sessions). In stage two, both groups were given identical sessions. These sessions consisted of five different trial types. Trial type 1 was identical to the trial described in stage one; 800 ms CS1 (light) with a 50 ms US (eyeshock) beginning 300 ms after the onset of the CS, as depicted in figure 3a. Trial type 2 consisted of CS2 (a tone) of 800 ms duration with a 50 ms US beginning 700 ms after the onset of the CS, as shown in figure 3b. In addition, there were 3 probe trial types. Trial type 3 was a combined probe trial with CS1 and CS2 both presented simultaneously for 800 ms, with no US. Trial type 4 was a tone probe, with an 800 ms presentation of CS2 with no US. Trial type 5 is a light probe, which presented CS1 for 800 ms without a US. A single session consisted of 21 trials each of trial types 1 and 2, and 6 each of the probe trial types 3, 4, and 5, for a total of 60 trials per session. These trials were presented intermixed with intertrial intervals (ITIs) of seconds. A single session lasted just over 30 minutes. 12

14 While two groups were used for the behavioral study, at this time only one rabbit from each group has been analyzed. This is due to our only recent acquisition of a computer powerful enough to match a waveform produced by many sessions. Recent improvements in the code s speed of execution as well as newfound access to a higher power computer will allow more data to be examined in shorter time during the coming weeks and months. 13

15 Figure 3. Training paradigm used in this study. See text for details. 14

16 Computational methods. After all the data were collected, a computer program was developed using Microsoft QuickC that produced the predicted waveform given specified model parameters (α, β, δ, γ, λ, number of trials, ITI, number and length of CSs, etc.) The TD model engine of this program was provided by June-Seek Choi. This program was then modified to read in the data file produced during the behavioral phase of the experiment and compute a mean square difference between the observed waveform and predicted waveform. The program was then further modified to vary the model parameters (α, β, δ, γ, λ) independently, computing a mean square difference for each combination. A least mean square difference of all attempted parameters was then determined, and this was used as the closest match case. The development and execution of this program was done on various computers, but primary execution took place on a Dell Dimension XPS R400 (Pentium II class machine, running at 400 MHz) with 96 MB of RAM. 15

17 Results Conditioned eye blink responses from probe trials were averaged for every five sessions (there were approximately 6 probe trials of each type (light/no US, tone/no US and light+tone/no US) in each session). The TD model with complete serial components simulator was programmed with the training condition, and the αβ, δ, γ, and λ parameters were systematically varied to find the parameters that produced a curve that best fit the observed average CR. For each parameter set, the mean square difference was determined between the averaged behaviorally observed waveform on probe trials, and the simulated curve was computed. This measurement was restricted to the time from the onset of the CS and the onset of the US. This is due to the model predicting connection weights, and thus behaviorally inaccurate waveforms after US onset, as discussed in the introduction. Mean square differences between the computed and behaviorally observed CRs were examined for each collection of five sessions for the set of parameters that produced the least mean square difference. Figures 4 and 5 depict the best-fit parameters for rabbits A and B, respectively, for both the light CS (top) and the tone CS (bottom). 16

18 A L- 5 A L- 10 A L- 15 A L- 20 A L- 25 αβ γ δ λ Parameters A T- 5 A T- 10 A T- 15 A T- 20 A T- 25 αβ γ δ λ Parameters Figure 4. Best fit parameters for rabbit A, light and tone CS. 17

19 B L- 5 B L- 10 B L- 15 B L- 20 B L- 25 αβ γ δ λ Parameter B T- 5 B T- 10 B T- 15 B T- 20 B T- 25 αβ γ δ λ Parameters Figure 5. Best fit parameters for rabbit B, light and tone CS. 18

20 Figures 6, 7, and 8 show CRs produced by averaging the observed responses of a rabbit to probe trials (bold lines, top and bottom). For all three figures, the top graph depicts the average CR during sessions 1-5. The bottom graph depicts the average CR during sessions The dotted lines on both the top and bottom graphs in all three figures represent the predicted waveform given the best fitting parameters for sessions 1-5. On the top graph, the number of trials used to compute the CR match session 3 (chosen to represent the median session of sessions 1-5). On the bottom graph, the number of trials used to compute the CR match session 23 (chosen to represent the median session of sessions 21-25). The dashed lines on both graphs are the predicted CR obtained by entering the parameters obtained as a best fit to the CR recorded during sessions As above, in the top graph the model was simulating the early sessions, in the bottom graph the model was simulating the late sessions. In figure 6 (rabbit A, light probe trials), dotted lines were produced using the parameters αβ = 1.0, γ = 0.76, δ = 0.99, λ = 0.25 (the best fit parameters for rabbit A, light probe trials, sessions 1-5). For sessions 1-5 (top), the mean squared difference between the observed CR and the CR computed using the early parameters is For trials averaged from sessions (bottom), this difference is The dashed lines in the top and bottom portions of figure 6 were produced using the parameters αβ = 0.1, γ = 0.76, δ = 0.24, λ = 0.50 (the best fit parameters for rabbit A, light probe trials, sessions 21-25). For sessions 1-5 (top), the mean squared difference between the observed CR and the CR computed 19

21 using the early parameters is For trials averaged from sessions (bottom), this difference is The ratio of the differences between simulated and observed CRs using the two sets of best fit parameters (those for sessions 1-5 and those for sessions 21-25) for the observed waveform of sessions 1-5 (top graph) is (F(75,75) = 13.90, P<0.01). The ratio of the differences simulated and observed CRs using the two sets of best fit parameters (those for sessions 1-5 and those for sessions 21-25) for the observed CR of sessions (bottom graph) is 1.65 (F(75,75) = 1.65, p<0.01). These ratios represent the difference between the ability of the parameters that best fit the curve observed at the beginning of training and the parameters that fit the curve at the end of training to match the curve observed during sessions 1-5 and 21-25, respectively. The difference in these ratios suggests that parameters are not strictly invariant throughout training. In figure 7 (rabbit B, light probe trials), dotted lines were produced using the parameters αβ = 0.1, γ = 0.44, δ = 0.24, λ = 0.25 (the best fit parameters for rabbit B, light probe trials, sessions 1-5). For sessions 1-5 (top), the mean squared difference between the observed CR and the CR computed using the early parameters is For trials averaged from sessions (bottom), this difference is The dashed lines in the top and bottom portions of figure 7 were produced using the parameters αβ = 1.0, γ = 0.76, δ = 0.99, λ = 0.50 (the best fit parameters for rabbit B, light probe trials, sessions 21-25). For sessions 1-5 (top), the mean squared difference between the observed CR and the CR computed 20

22 using the early parameters is For trials averaged from sessions (bottom), this difference is The ratio of the differences between simulated and observed CRs using the two sets of best fit parameters for the observed CR of sessions 1-5 (top graph) is (F(75,75) = 69.16, p<0.01. The ratio of the differences between simulated and observed CRs using the two sets of best fit parameters for the observed CR of sessions (bottom) is (F(75,75) = 11.15, p<0.01). For tone probes, in rabbit A (no figure) there was no difference between the best fit parameter values for the waveform observed during sessions 1-5 and sessions (αβ = 0.1, γ = 0.99, δ = 0.24, λ = 0.25). Obviously, the same parameter set describing the same training protocol produces the same curve, and there is no significant difference (ratio is 1.00). In figure 8 (rabbit B, tone probe trials), dotted lines were produced using the parameters αβ = 0.1, γ = 0.09, δ = 0.74, λ = 0.25 (the best fit parameters for rabbit B, tone probe trials, sessions 1-5). For sessions 1-5 (top), the mean squared difference between the observed CR and the CR computed using the early parameters is For trials averaged from sessions (bottom), this difference is The dashed lines in the top and bottom portions of figure 7 were produced using the parameters αβ = 0.2, γ = 0.99, δ = 0.49, λ = 0.50 (the best fit parameters for rabbit B, tone probe trials, sessions 21-25). For sessions 1-5 (top), the mean squared difference between the observed CR and the CR computed 21

23 Observed, & Com puted Waveform s Rabbit A, L- Probe, Param eters after 5 & 25 trials Observed Averaged Over Observed (Sessions 1-5) Computed from Session 1-5 Parameters Computed from Session Parameters MS from CS onset Observed, & Com puted Waveform s Rabbit A, L- Probe, Param eters after 5 & 25 trials Observed Averaged Over Observed (Sessions 21-25) Computed from Session Parameters Computed from Session 1-5 Parameters MS from CS onset Figure 6. Observed and computed waveforms for Rabbit A, L- probes. Observed waveform from sessions 1-5 (top) and (bottom), computed waveforms from best-fit parameters for sessions 1-5 and (both). See text for details. 22

24 Observed, & Com puted Waveform s Rabbit B, L- Probe, Param eters after 5 & 25 trials Observed Averaged Over Observed (Sessions 1-5) Computed from Session 1-5 Parameters Computed from Session Parameters MS from CS onset Observed, & Com puted Waveform s Rabbit B, L- Probe, Param eters after 5 & 25 trials Observed Averaged Over Observed (Sessions 21-25) Computed from Session Parameters Computed from Session 1-5 Parameters MS from CS onset Figure 7. Observed and computed waveforms for Rabbit B, L- probes. Observed waveform from sessions 1-5 (top) and (bottom), computed waveforms from best-fit parameters for sessions 1-5 and (both). See text for details. 23

25 Observed, & Com puted Waveform s Rabbit B,T- Probe, Param eters after 5 & 25 trials Observed Averaged Over Observed (Session 1-5) Computed from Session 1-5 Parameters Computed from Session Parameters MS from CS onset Observed, & Com puted Waveform s Rabbit B, T- Probe, Param eters after 5 & 25 trials Observed Averaged Over Observed (Sessions 21-25) Computed from Session Parameters Computed from Session 1-5 Parameters MS from CS onset Figure 8. Observed and computed waveforms for Rabbit B, T- probes. Observed waveform from sessions 1-5 (top) and (bottom), computed waveforms from best-fit parameters for sessions 1-5 and (both). See text for details. 24

26 using the early parameters is For trials averaged from sessions (bottom), this difference is The ratio of the differences between simulated and observed CRs using the two sets of best fit parameters for the observed CR of sessions 1-5 (top graph) is (F(75,75) = 19.15, p<0.01. The ratio of the differences between simulated and observed CRs using the two sets of best fit parameters for the observed CR of sessions (bottom) is (F(75,75) = 22.31, p<0.01). It is worth noting, as can be seen on figure 8, that the average observed waveform for sessions 1-5 is largely a flat line; there was not a robust CR at that point in the training. Therefore, it is not surprising that there is a shift in parameters by the end of the training, where a large CR was observed. Appendix A contains surface plots representing the values of the mean square difference between observed waveforms and curves produced while varying δ and γ, with αβ and λ fixed to their best fit values. It is from these displays that the strongest conclusions about parameter variation during training can be drawn. Specifically, the surface plots of the light probe trials are all very similar. Remarkably alike are the plots for rabbit A from sessions 1-5 (figure A-1) and (figure A-2) and rabbit B from sessions (figure A-3). These three plots all represent light training after at least 473 separate trials (B sessions 21-25: 473 trials, A sessions 1-5: 653 trials, A sessions 21-25: 1073 trials). Comparing this with figure 4, it can be seen that the primary difference between sessions 1-5 and is the values of αβ (1.0 in sessions 1-5, 0.1 in sessions 21-25) and λ (0.25 in sessions 1-5 and 0.5 in sessions 21-25). These parameters represent the 25

27 rate at which learning takes place and the effectiveness of the US, respectively. The differences in δ (.99 in sessions 1-5 and.24 in sessions 21-25) can been seen in figures A-1 and A-2 to represent an insignificant difference in predictive power. That is, there only a very slight difference between the mean square error produced with δ = 0.24 and that produced with δ = The surfaces generated by the tone probe averaged over sessions are also a nearly identical shape generated by both rabbits A (figure A-6) and B (figure A-8). However, unlike the plots for the light probe trials, these plots do not have the wide, flat range of nearly best fit parameter values. Instead, these graphs display a ridge, with minimal values for mean square difference on either side. This might help explain the apparently sudden shifts in best fitting parameter values for tones seen in the bottom halves of figures 4 and 5 Also similar are the plots of rabbit B during sessions 1-5 of both light and tone, i.e. was during initial acquisition of CRs. This similarity might suggest that initial training to a novel stimulus yields similar best fit parameters, regardless of the CR modality or the inter-stimulus interval. Overall, these surface plots help demonstrate that while the best fit parameters might appear to have large changes between sessions, many of these changes have only slight consequences for accurately simulating CRs. The similarity of these surfaces suggests strongly that similar ranges of parameter values are optimal in similar training conditions. However, the large, flat areas of these plots suggest that there is a wide range of parameter values that 26

28 produce an almost best fit. In other words, the parameters suitable for simulating a CR can fluctuate over a wide range without deviating greatly from CRs observed behaviorally. 27

29 Discussion Theoretical learning models are valuable not only because they accurately simulate CRs. They can also provide a foundation for relating the model to neuronal processes involved in learning and generating CRs. As summarized in figure 9, Moore and Choi (1997) have suggested cerebellar locations for the various components of the TD model. The onset or offset of a CS is modeled as initiating a spread of activation throughout neurons related to the sense modality involved (e.g. sight, hearing). This activation engages pontine nuclear cells, their related cerebellar mossy fibers, and granule cells, allowing for timing information to be introduced to the cerebellum. The specific information regarding the CR appears to be introduced through actions of parallel fiber and climbing fiber inputs to the Purkinje cells. These inputs produce synaptic long-term depression, down-modulating the firing rate of the Purkinje cells. As the Purkinje cells have an inhibitory effect on the deep cerebellar nucleus interpositus (IP), the presumed site of CR generation, a depression of the Purkinje cells activity allows the IP cells to fire at a more rapid rate, giving rise through its projections to a CR. The IP projects to the red nucleus, which in turn activates the motoneurons of the peripheral muscles of the eyelid and eyeball, causing the eye blink. Moore and Choi (1997) have postulated specific locations within the cerebellar systems that the various components of the TD model represent. They suggest that the red nucleus, in addition to activating the motoneurons for the eye 28

30 muscles, feeds back information regarding Y(t) (eyelid position at time t) to the Purkinje cells. The Purkinje cells apply the γ discount factor to the Y(t) information and pass this information on via inhibitory synapses on Golgi cells. On these same Golgi cells, inputs from the spinal trigeminal nucleus pars oralis, carrying the Y(t-1) information, form excitatory synapses, consistent with the model in that they detract from the input of γy(t). The Golgi cells themselves form inhibitory synapses on cells of the pontine nucleus which, as alluded to earlier, carries CS (or X(t), in the model) information. These cells then synapse on the Purkinje cells, which as discussed before, output on the IP. The results presented in this paper are encouraging support for the validity of the TD model with complete serial components. By demonstrating that a wide, but stable range of parameters can be used to produce behavioral waveforms throughout the course of the training period, this research helps support the TD model as an accurate simulation of the learning process, as well as suggest that there is an underlying physical, neuronal basis onto which this formula can be mapped. 29

31 Figure 9. Neural circuits of the cerebellum implementing the variables of the TD model. See text for details. RN = red nucleus; PC = Purkinje cells; SpO = spinal trigeminal nucleus pars oralis; Go = Golgi cell; PN = pontine nucleus; IP = interpositus nucleus; IO = inferior olivary nucleus. (Moore & Choi, 1997) 30

32 This is especially compelling when viewed in light of the surface plots in the appendix. The ranges displaying nearly best fit curves are very similar for different rabbits undergoing the same training paradigm. This is particularly evident when examining the light probe trials and the late tone trials. The immediate next phase of research will be to analyze the data collected on rabbits C and D in this study. This data should allow a better characterization of the similarities and differences of the behavior of parameters within and between rabbits during their training period. However, this study has already provided evidence suggesting that at different stages of training, similar values or ranges of values will produce the optimal curve to match behaviorally observed waveforms. Future work will provide better indications of just how these parameters behave. 31

33 Bibliography Blazis, D. E. J. & Moore, J. W Conditioned Stimulus Duration in Classical Trace Conditioning: Test of a Real-Time Neural Network Model. Behavioural Brain Research. 43: Desmond, J. E Temporally Adaptive Responses in Neural Models: The Stimulus Trace. In M. Gabriel & J. Moore (Eds.), Learning and Computational Neuroscience: Foundations of Adaptive Networks (pp ). Cambridge, MA: MIT Press. Desmond, J. E. & Moore, J. W Adaptive Timing in Neural Networks: The Conditioned Response. Biol Cybern. 58: Desmond, J. E. & Moore, J. W Altering the Synchrony of Stimulus Trace Processes: Tests of a Neural-Network Model. Biol Cybern. 55: Moore J. W. & Choi, J-S Conditioned Stimuli Are Occasion Setters. In N. Schmajuk & P. Holland (Eds.), Occasion Setting: Associative Learning and Cognition in Animals (pp ). Washington, D.C.: APA. Moore, J. W. & Choi, J-S Conditioned Response Timing and Integration in the Cerebellum. Learning and Memory. 4: Moore, J. W. & Desmond, J. E A Cerebellar Neural Network Implementation of a Temporally Adaptive Conditioned Response. In I. Gormenzano & E. Wasserman (Eds.), Learning and Memory: The Behavioral and Biological Substrates (pp ). Hillsdale, NJ: Lawrence Erlbaum Associates. 32

34 Rescorla, R. A. and A. R. Wagner A Theory of Pavlovian Conditioning: Variations in the Effectiveness of Reinforcement and Nonreinforcement. In A. H. Black and W. F. Prokasy (Eds.), Classical Conditioning II: Current Theory and Research (pp ). New York: Appleton-Century-Crofts. Sutton, R. S. & Barto, A. G Time-Derivative Models of Pavlovian Reinforcement. In M. Gabriel & J. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks (pp ). Cambridge, MA: MIT Press. 33

TEMPORALLY SPECIFIC BLOCKING: TEST OF A COMPUTATIONAL MODEL. A Senior Honors Thesis Presented. Vanessa E. Castagna. June 1999

TEMPORALLY SPECIFIC BLOCKING: TEST OF A COMPUTATIONAL MODEL A Senior Honors Thesis Presented By Vanessa E. Castagna June 999 999 by Vanessa E. Castagna ABSTRACT TEMPORALLY SPECIFIC BLOCKING: A TEST OF