Multiple-object Working Memory A Model for Behavioral Performance

Multiple-object Working Memory A Model for Behavioral Performance D.J. Amit 1,, A. Bernacchia 1 and V. Yakovlev 3 1 Dipartimento di Fisica, Istituto di Fisica (INFM), Università di Roma La Sapienza, Piazzale A. Moro 1, 00185, Roma, Italy, Racah Institute of Physics and 3 Institute of Life Sciences, Hebrew University, Jerusalem, Israel In a psychophysics experiment, monkeys were shown a sequence of two to eight images, randomly chosen out of a set of 16, each image followed by a delay interval, the last image in the sequence being a repetition of any (one) of the images shown in the sequence. The monkeys learned to recognize the repetition of an image. The performance level was studied as a function of the number of images separating cue (image that will be repeated) from match for different sequence lengths, as well as at fixed cue match separation versus length of sequence. These experimental results are interpreted as features of multi-item working memory in the framework of a recurrent neural network. It is shown that a model network can sustain multi-item working memory. Fluctuations due to the finite size of the network, together with a single extra ingredient, related to expectation of reward, account for the dependence of the performance on the cue-position, as well as for the dependence of performance on sequence length for fixed cue match separation. Introduction The internal (cortical) structure of working memory (WM), as a tool for cognitive behavior, as well as its neurophysiological correlates, are of central interest in the effort to decipher the computational mysteries of the brain. Enhanced selective delay activity in associative parts of the cortex has been identified as a convincing candidate for a neural correlate of working memory (Fuster and Alexander, 1971; Kubota and Niki, 1971; Miyashita and Chang, 1988; Funahashi et al., 1989). On the theoretical side, such delay activity distributions, representing single items, have been accounted for by formation, during training or innately, of a Hebbian synaptic structure in recurrent networks, i.e. the appearance of stimulus-selective, sub-populations of excitatory cells within a cortical module, which have potentiated synapses between them and weakened synapses with other cells. Such structures can maintain selective delay activity (Amit, 1995; Amit and Brunel, 1997a,b; Brunel and Wang, 001). In a seminal set of experiments (Miller et al., 1996), monkeys were trained to recognize the repetition of the first image in a sequence, disregarding intervening images. With some more training, the monkeys learned to ignore also intervening repeated pairs. It was observed, in single-cell recordings, that neural delay activity correlates of the working memory of the first stimulus in a sequence were destroyed in inferotemporal (IT) cortex by the intervening stimuli, but were preserved, despite intervening stimuli, in pre-frontal (PF) cortex. This finding could be interpreted as implying either that PF cortex screens out intervening stimuli and maintains WM only of the first stimulus, as modeled by Brunel and Wang (Brunel and Wang, 001), or that WM could be structured to preserve the representations of several images in a sequence of images, as the union of the subpopulations representing the WM of each of the presented images. The need for multi-item WM was been recognized >50 years ago: There are indications that, prior to the internal or overt enunciation of the sentence, an aggregate of word units is partially activated or readied. Evidence for this comes from contaminations of speech and writing (Lashley, 1951). In thinking of language, the formation of a phrase, spoken or written, would require the activation of its underlying elements (multi-item WM) before syntax can be applied to expose its surface structure. Yakovlev et al. (Yakovlev et al., 000), stimulated by the initial difficulty that monkeys encountered in eliminating false positives in ABBA sequences (Miller et al., 1996), tested this alternative. Using sequences of from two to eight images (selected at random out of 16), in each of which the last image is a repetition of (any) one of the preceding images (see, for example, Fig. 1a). Each image is presented for 1 s, followed by a delay interval of either 1 or 3 s. Monkeys, rewarded for noticing the repetition, learn to perform well. In each trial, the number of images and their identities before repetition, the position in the sequence of the image to be repeated and the length of the delays are all chosen at random. (Additional details of the experimental study are to be presented in a forthcoming paper and the details of the presentation of the cited abstract can be viewed at: titanus.roma1.infn.it/docs/ storage/eilatposter.pdf.) We define the following variables (Fig. 1a): n is the length of the sequence, the number of images presented before the repetition; q is the position of the image which is eventually repeated; d is the distance between the cue and the repeated image, i.e. d = n q + 1. In Figure 1a, n = 5, q = and d = 4. There are therefore two independent variables, which we take as d and n. It is found that: the total performance level (percentage correct responses for all sequences of a given length n) decreases with increasing length of sequence, tending to a f lat asymptote (Fig. 1b); for sequence of fixed length n, performance decreases with increasing cue match distance d (Fig. 1c), equivalent to decreasing q; at fixed cue match separation d, performance improves with sequence length n (Fig. 1d), equivalent to increasing q. We model these experimental results with a double aim: to test whether modeling at the neural level can be as successful in the multi-item case as for single items and to use a successful model to provide cues for subsequent (single-unit) neurophysiological and cognitive studies. Comments 1. Bars in Figure 1b are averages over experimental data included in either Figure 1c or d, over all trials of the same sequence length n, irrespective of the position q of the cue in the sequence (hence for all allowed values of d).. The points in Figure 1c,d are identical. The only difference is Oxford University Press 003. All rights reserved. Cerebral Cortex May 003;13:435 443; 1047 311/03/$4.00

Figure 1. Multiple item working memory (experiment). (a) The structure of the trial, defining n number of images in sequence prior to repetition; q, cue position (defined by repeating stimulus); d, distance of cue to match (d = n q + 1). In the figure, n = 5, q = and d = 4. All vary at random from trial to trial. (b)overall performance versus length of trial, for all allowable values of d. (c) Performance increase at fixed n with increasing cue position q, equivalent to decrease with cue match separation d(d = n q + 1). Each point represents the fraction of trials of a given q (abscissa) and given sequence length n, in which repetition was recognized (points connected by lines). (d) Performance improvement with sequence length at fixed cue match separation: same points as in (c); points connected are of equal cue match separation d. Increasing q implies increasing n. that in Figure 1c the lines connect points of equal sequence length n (and, hence, varying d), while in Figure 1d they connect points of equal separation d (and varying sequence length n). 3. From the cognitive point of view, decay of performance with cue match separation is natural: It is more difficult to hold in WM a representation of an item the greater the number of successive stimuli and/or the longer the time elapsed. To model the observed phenomenon one needs a network which can sustain multiple-item WM. In the absence of noise, items present in WM would tend to become equivalent and, hence, no decrease of performance as cue match separation increases, and all memories would be lost together (blackout); this is exemplified in Figure a,b. In contrast, what is observed is that older elements get lost (to WM) sooner and stochastically. This gap is bridged in our model by the reintroduction of noise, which is a natural feature of a microscopic model (see below). 4. That performance at equal separation improves with sequence length in other words, when more, irrelevant, stimuli precede the cue of the matching pair is surprising. To model the improvement of performance for longer sequences, one needs multiple WM with increasing stability, which we model by an external element: spiking local networks of excitatory and inhibitory neurons require a significant number of non-selective afferent currents to the neurons of the module (Van Vreeswijk and Sompolinsky, 1996; Amit and Brunel, 1997a). We assume that as the sequence becomes longer and no reward is obtained, reward expectation (RE) increases, expressed by a reduction of the non-selective external afferent (see below). This hypothesis is directly controllable in physiological experiments. Methods The Microscopic System The network consists of excitatory and inhibitory integrate-and-fire (IF) neurons. The dynamic of depolarization V i of neuron i is: i d i dvi = V t V I t g dt syn i 0 i i where i is the leak time, V 0 is the membrane resting potential (equal for all neurons) and g i is membrane conductance. When V i reaches (1) 436 A Multiple-object Working Memory Model Amit et al.

Figure. Network rate dynamics: average rates in all populations during presentation of a sequence of seven images. Repetition is not simulated nor presented (see text). (a, b) No noise nor reward-expectation (RE): (a) 18 rates (1,...,16, background, inhibition) versus time. Square wave at bottom: presentation protocol. Scenario: 1s stimulus-1, 1s delay activity, 1s stimulus-; second delay period, etc. During presentation of stimulus-1, one population (color-coded) emits at 8 Hz; inhibition is at 1 Hz; non-stimulated populations around 3 Hz. Rate in first delay period: 19 Hz. Between and 4 s, two populations are in WM; inhibition rises and emission rate in double attractor (-item WM) decreases, due to rise in inhibition; following presentation of stimulus-3, three populations coexist in WM, higher inhibition, lower delay rates. Until a 6-item WM is set active at 11 s. Following the presentation of stimulus-7, the 6-attractor collapses, all memory erased and only population-7 remains in 1-item delay activity. (b) Complementary representation of the dynamics in (a): color-coded rate in each population (abscissa) versus time (ordinate). Rate color code at right. Same data as in (a). 1,,3,4,5,6-item WM are observed, then, following the presentation of stimulus-7, blackout and only population-7 remains in WM. Parameters: ν ext = 3.8 Hz, J + = 14, f = 0.01, ν C = 0.07 Hz. (c, d) Network dynamics with noise and RE: (c) representation as in (a), rates fluctuate on short time scale. Inhibition rate decreases with time and delay rates increase due to RE. First population falls out in the delay period, following stimulus-5. Population-6 falls out during presentation of 7. (d) The same data as in (b) are presented, color-coded population rate dynamics. Additional parameters: Q = 0.008 mvms 0.5, ν ext = 0.07 Hz. threshold, θ, a spike is emitted and the neuron is repolarized at reset potential V r (both uniform among the neurons) and is blocked for absolute refractory period arp i. The total synaptic current I syn i (t) [see Supplementary Material, equation (9)], arrives via three types of receptors: AMPA for collateral and external excitatory connections; NMDA for collateral excitation; and GABA for collateral inhibition. When a spike reaches a synapse, the AMPA current is characterized by an instantaneous rise and fast decay ( ms), NMDA current has a fast rise ( ms) and very slow decay (100 ms) and GABA current exhibits an instantaneous rise and slow decay (10 ms) (Brunel and Wang, 001). Currents depend on the post-synaptic depolarization, linearly for AMPA and GABA. Because of the magnesium blockade of the NMDA receptor at low post-synaptic potential, the NMDA current is non-linear. This current saturates for high afferent rates (see Supplementary Material). In the network, each neuron receives C E collateral excitatory synapses and C I inhibitory synapses, as well as C ext external, non-selective excitatory connections. Initially, biologically plausible synaptic efficacies are distributed at random and it is checked that spontaneous activity is stable at plausible rates. This requires that collateral inhibition dominate collateral excitation. Hence, it is left to external, non-selective afferents to keep neurons spiking (Van Vreeswijk and Sompolinsky 1996; Amit and Brunel, 1997a,b). Extended Mean-field Simulation When stimuli are non-overlapping, the network divides into separate functional populations, each of which is described by the mean variables of its neurons. In our case, there are 16 populations, each corresponding to neurons visually responsive to one of the images, one population of all excitatory neurons non-responsive to any stimulus and one of all inhibitory neurons. Neurons in each of these populations are functionally identified in that 1. they receive (on average) currents with the same distribution, both from outside the network and from collateral connections, whether the system is stimulated or not; the current is characterized by its mean and variance (Amit and Brunel, 1997a);. structuring produces (on average) a distinct synaptic structure connecting them to each other and to other neurons in the network. The assumption of distinct populations for each image renders neurons with perfectly sharp tuning curves. This is, of course, not very realistic. It Cerebral Cortex May 003, V 13 N 5 437

is, however, essential to be able to proceed with the very effective mean-field theory. On the other hand, it is not a limitation for a full microscopic simulation and we do not expect qualitative differences. Such simulations are being carried out. The simulation process starts by selecting a set of parameters for the neurons, the synapses and the network. Twenty-nine (!) out of the first 30 parameters in Table 1, defining the neurons on the receiving and the emitting ends, and the number of connections, are taken verbatim from Brunel and Wang (Brunel and Wang, 001), where they were selected for biological plausibility. Only g G E has been increased by 3%, to render spontaneous activity more stable. The parameters at our disposal are related to the structuring of the network, the structure of the stimuli and of the task, i.e. the coding level (fraction of excitatory neurons in a sub-population coding for a given image) f, the non-selective afferent rate, maintaining spontaneous activity, ν ext (see above), the potentiation amplitude, J +, (due to learning of each individual stimulus); the contrast rate, ν C, (the selective increase in external rate during the presentation of an image); the noise amplitude Q (mimicking finite-size fluctuations) and the RE parameter, ν ext (the decrease of the non-selective external input with increasing trial length). The structuring of the network follows a selection of 16 visually responsive sub-populations, each composed of a fraction f of the excitatory neurons. The recurrent synapses are potentiated (on average) by a factor J + within each population and depressed from each population to all other excitatory neurons. Depression is taken equal to J = (1 fj+)/(1 f) so the spontaneous rate does not vary when structuring takes place. When training renders J + large enough, the neurons in each population once stimulated can maintain selective delay activity when the stimulus is removed (Amit and Brunel, 1997a,b). An initial set of 18 rates, 18 mean depolarizations, their corresponding 18 variances (this is the sense in which MF is extended) and 18 effective time constants is selected (effective due to receptor dynamics). Eighteen new effective integration time constants, means and variances of the afferent currents to neurons in the different functional populations are computed from µ L NM g eff ζ t = h 1+ Cζ J ζ ρ g ζ is the membrane integration time constant of post-synaptic neurons in population; eff, µ and σ are the corresponding effective time constant, mean and variance of the current [the variance is approximated by its external (excitatory) part]. These quantities are related to the renormalized dynamics, in which all terms linearized as functions of the depolarization are grouped together in the drift term. The sums are over the presynaptic populations, ζ, each of which contributes with C ζ connections (note that we assume that the number of connections arriving to any type of neuron depends only on the pre-synaptic population) and for each of those over the different receptors r: those would be AMPA (r = A) receptors for neurons external to the module, AMPA and NMDA (r = A, N) receptors for excitatory collateral neurons and GA BA (r = G) for inter-neurons; J ζ is the synaptic efficacy from neurons in population ζ to those of population. In the absence of structuring, its value is 1 (see above); g is the conductance of the post-synaptic membrane, while g r ζ is the conductance of the receptor r on the synapse from ζ to. The index ζ will be omitted on g r ζ because in our network (structured or unstructured), r determines univocally ζ. V ζ inv is the inversion potential of neurons in the presynaptic population ζ and V 0 is the post-synaptic resting potential (equal for all neurons). S r ζ(t) is the synaptic gating variable corresponding to receptor r, which depends on r O QP 1 r r r btg Sζbtg eff t ζ t Cζ J ζ Vζ V g r inv r r = e j χbtg Sζbtg 0 g ζ r g σ btg ext A inv eff = VE Vbtg Cextνextbtg btg g () (3) (4) Table 1 Parameters of neurons, synapses, receptors, stimuli and structuring Excitatory connections c E = 800 Inhibitory connections C I = 00 External (excitatory) connections C ext = 800 Resting potential V 0 = 70 mv Threshold θ = 50mV Reset potential V r = 60 mv Excitatory membrane conductance g E = 5 ns Inhibitory membrane conductance g I = 0 ns Excitatory membrane leak time e = Inhibitory membrane leak time I = 10 ms Excitatory absolute refractory period E arp = ms Inhibitory absolute refractory period I arp = 1 ms AMPA current decay time ampa = ms GABA current decay time gaba = 10 ms NMDA current decay time decay NMDA = 100 ms NMDA current rise time rise NMDA = ms AMPA (external) conductance on excitatory post-synaptic g Aext E =.08 ns AMPA (external) conductance on inhibitory post-synaptic g Aext I = 1.6 ns AMPA (recurrent) conductance on excitatory post-synaptic g A E = 0.104 ns AMPA (recurrent) conductance on inhibitory post-synaptic g A I = 0.081 ns GABA conductance on excitatory post-synaptic g G E = 1.9 ns GABA conductance on inhibitory post-synaptic g G I = 0.973 ns NMDA conductance on excitatory post-synaptic g N E = 0.39 ns NMDA conductance on inhibitory post-synaptic g N I = 0.58 ns Inversion potential on excitatory pre-synaptic V inv E = 0 mv Inversion potential on inhibitory pre-synaptic V inv I = 70 mv Mg + ion concentration [Mg + ] = 1 mm NMDA saturation parameter α = 500 Hz Inverse Mg + blockade potential β = 0.06 mv 1 Strength of Mg + blockade γ = [mg + ]/3.57 mm = 0.8 Number of coded stimuli p = 16 Coding level f = 0.01 Potentiation J + = 14 Contrast rate ν C = 0.07 Hz the presynaptic rates in population ζ. Its mean <S r h> depends linearly on rates for AMPA and GABA, and it saturates for NMDA. ρ N and χ N take into account the effect of the NMDA current on post-synaptic neurons of population. The mean-field expressions for <S r ζ>, ρ r and χ r the mean membrane potential <V >, are given in the Supplementary Material, equations (14 19). The Mean Field dynamic is defined by d i dm eff t M t V µ t Q t dt = 0 + + Γ eff dσ btg Σ btg σ btg dt = + which represent the evolution of the depolarization mean M and variance Σ of population in the absence of a threshold, i.e. for small eff ν (<0.05 in spontaneous activity, <0. in delay activity). On the right-hand side of equation 1, we have already introduced the surrogate (finite-size) Gaussian noise term: QΓ(t), added to the evolution of the mean depolarization (see below). Given the quantities M (t), Σ and eff ν(t), one obtains the average emission rate in population e eff arp ν t = Φ M t, Σ t, t, The form of the IF-neuron s response function Φ, given the statistics of its afferent current, is (Ricciardi, 1977; Tuckwell, 1988) e j 1 Φ µ, σ,, = + Ψ µ, σ, θ µ F I Ψ µ σ σ 1+ 0. 5 1. 03 0. 5,, µ e j = HG K J + z A A A π 1+ r x V r erf x e dx σ e j j (5) (6) (7) (8) 438 A Multiple-object Working Memory Model Amit et al.

in which the correction for finite integration time, A, of the external AMPA noise has been taken into account (Brunel and Sergi, 1998; Brunel and Wang, 001) as in equations (0 3) of Brunel and Wang. The mean rates, in turn, enter the computation of the new values of ρ, χ, <S> and <V>, which lead, via equations (14 19) (Supplementary Material), to the new values of eff, µ and σ. Finite-size Noise and Expectation-attention The first simulates the noise due to the finite size of the network, which the mean-field approach neglects. It is represented by a stochastic term, QΓ(t), added to the average current feeding the mean depolarization, equation (5). It is added only to the dynamics of the mean depolarization, where it is expected to have the most effect. Q is the amplitude of this noise and Γ a normal Gaussian process. Even in the presence of NMDA receptors, there are significant size-dependent fluctuations (Compte et al., 000). The effect of reward-expectation is modeled by a decrease of the external afferents (sustaining spontaneous activity): ν ext (equation 1) decreases, upon every presentation, by ν ext. Simulations show that the f luctuating currents can be reasonably approximated by a Gaussian process, whose amplitude depends on the size of the network number of neurons, and/or number of contacts (Brunel and Wang, 001). Protocol A set of between one and seven items is selected and one of those stimuli already presented is designated as match. The network is launched in spontaneous activity and then the stimuli in the sequence are presented with the real time-course of the experimental protocol. The presentation of a stimulus consists of raising the external rate to the corresponding sub-population of neurons by the contrast, i.e. from ν ext to ν ext + ν C. For each pair (n, q), we present 00 sequences with 1 s delays and 00 with 3 s delays. At the end of the delay interval following the last stimulus, we test whether the population of the cue is still alive in WM, by averaging the rate in the population of the cue, over the last 100 ms. The fraction of cases, out of 00 trials, in which it is still in WM is the performance level at this separation for that sequence length (Fig. 4a,b). We lump together trials with different internal sets of delays, as was done in analyzing the experimental data. Results WM Dynamics in Absence of Noise and Reward Expectation MF-theory predicts accurately the different stationary activity states, before, during and following stimulation, observed in simulation of networks of spiking neurons (Amit and Brunel, 1997a,b). When the elements of the network are endowed with synaptic receptor dynamics and a finite synaptic time constant, phenomenology is richer and more stable. Despite added complexity, a MF description proved possible (Brunel and Wang, 001) and we follow this approach here, to be able (computation-wise) to perform the large number of repetitions of trials of long physical duration (real 14 8 s) required. Here, MF theory is extended to non-stationary situations (Amit and Brunel, 1997a), to allow for dynamical simulation of the full experimental protocol (see Methods). In Figure a, we present the rate dynamics of the network without finite-size noise and expectation-attention. Each stimulus of the sequence is presented for 1 s, followed by 1 s delay. Plotted are the average rates in all 18 (color-coded) populations (16 selective to familiar images, all non-selective excitatory cells and all inhibitory cells). Following 100 ms spontaneous activity, stimulus-1 elicits a rate of 8 Hz in population-1, 1 Hz in the inhibitory one and 3 Hz in the 16 non-stimulated populations. It is followed by 1 s delay activity in the stimulated population (19 Hz) and a decrease of the inhibition. Stimulus- is presented, population- emits at 7 Hz for 1s, while the rate in population-1 is a bit lower due to increased inhibition. The delay activity following stimulus- has equal rate in both populations. Note that no rate is seen to go down and the rate of the double WM is lower than that of the single-item WM (due to additional inhibition provoked by the double number of active neurons). Following the presentation of stimulus-6, six populations coexist in delay activity, at 15 Hz. The presentation of stimulus-7 causes a blackout: all six active populations return to spontaneous activity and the network state becomes rapidly identical (in terms of rates) to that following the presentation of the first stimulus, but in population-7. The same story is told in a complementary fashion in Figure b: rates (color-coded) in each population separately (abscissa) versus time (ordinate). The representation as in Figure a facilitates the reading of the rates, while that in Figure b discriminates better the relevant populations. The presentation of a stimulus is seen as a 1 s high rate every s, followed by delay activity at an elevated (but lower) rate. In the interval of 4 s, two populations coexist at elevated rates, then three, four, fuve and six populations, until, at 1 s, upon presentation of the seventh stimulus, only the seventh population remains in elevated activity, all others having returned to spontaneous activity. This dynamics of the cortical module implies the following behavioral pattern. In sequences of up to six images, repetition is recognized, up to a separation of five, with no decrease of performance with separation and no improvement with sequence length at fixed separation. Effect of Finite-size Noise and RE Real networks f luctuate around stationary states, so finite-size noise must supplement MF theory. These f luctuations are absent in MF theory and we reintroduce them ad hoc (see Methods). The amplitude of the noise is a parameter. It was found (Brunel and Wang, 001) that delay activity is stable in a finite range of (non-selective) external afferents: low external rates (currents) cannot sustain delay activity, while too high afferents destroy it. In both cases it is the collateral inhibition that creates the instability. In a wide range between these two situations, the stability of delay activity decreases with increasing external rates [single and multiple (Bernacchia, 001)]. We use this fact to exploit the decrease of external rate due to RE to improve performance for long sequences. Figure c,d presents a sample rate dynamics in the presence of finite-size internal noise and external noise reduction (due to expectation-attention), the analog of Figure a,b. The rates f luctuate on a short time scale, due to finite-size f luctuations. With the passage of time into the trial, inhibition decreases due to the decrease in external afferents and WM rates increase (Brunel and Wang, 001). Narrative: a 5-item (1,, 3, 4, 5) WM state sets in following stimulus-5 and the loss of item-1 at 9.7 s. At 1.5 s, item-6 is lost. Until 9.7 s, the repetition of any of images 1 5 gives a correct (positive) response. Later, repetition of 1 produces error. The richness of the dynamic behavior of the network is exhibited in Figure 3a d. Four sample scenarios are presented, chosen at random from the hundreds of trials run, with 1s delays (Fig. 3a,b) and with 3s delays (Fig. 3c,d). Note the evolution of multi-item delay activity, which propagates for several seconds; the random loss of populations in working memory; that earlier populations tend to be lost earlier (leading to performance decrease with cue match separation); and that some populations tend to hop into delay activity spontaneously (as in Fig. 3b d). The corresponding behavior: in Fig. 3b, for example, if one of images 1 5 is repeated prior to 9.5 s following the start of the trial, response is positive and correct. From 9.5 s on, image 4 has Cerebral Cortex May 003, V 13 N 5 439

Figure 3. Sample network rate dynamics, with noise and expectation-attention (RE). Conventions as in Figure b,d. Protocols: stimulus 1 s, delays in (a, b) 1 s in (c, d) 3 s. Scenarios: (a) presentation of stimulus-1, 1 s delay, stimulus-, second delay, stim 3, 4, 5 followed by 5-item WM. At 10.5 s, population- is lost to WM. Following stimulus-7, 6-item WM (1-3-4-5-6-7). Until 10.5 s, repetition of images 1 5 produces correct positive. Later, repetition of leads to error. (b) Items 1 4 enter WM. Item 4 enters spontaneously, at 1 s, before it is presented. It does not lead to a false positive, because it is lost at 4.5 s, while it is presented at 6 s. At 9.5 s, item 4 is lost; at 8 s, item 5 enters and at 11 s, item 3 is lost. (c) At 15 s, item is lost and, following the presentation of item 7, at 4 s, there are six items in WM (1, 3, 4, 5, 6, 7). They all give a correct positive. Item 10 enters spontaneously at 1 s and persists until 16 s, without false positive. (d) Example of the loss of three populations and of long-lived spontaneous elements in WM. Parameters as in Figure. exited WM and its repetition is not recognized, producing an error; after 11 s, repetition of image 3 is not recognized and produces an error. The repetition of each of images 1,, 5 and 6 are recognized (they are still in WM) and, hence, are recalled correctly. Note also that the spontaneous (non-stimulated) activation of WM of item-4 does not lead to a false positive, because by the time item-4 is actually presented (at 6 s), which might have been conceived as a repetition, it is no longer in WM. In Figure 3c, item- is lost at 14 s, while item-11 enters WM spontaneously. In Figure 3d, items 1, and 4 are lost and 13 enters spontaneously. Model Performance Statistics To mimic the experiment, we present sequences of one to seven stimuli. Each trial is run 00 times with 1 s delay intervals and 00 times with 3 s delays. Following a sequence of n images, we imagine the presentation of the repetition (this would be the n + 1th image) and we check which images are still present in WM, i.e. which populations of the sequence presented still have delay activity and which have lost it (see Methods). The surviving populations are classified by their sequence length (n + 1) and by their cue match distance (of d = n q + 1), where q is the ordinal position of the cue. The distribution of these average rates [over the repetitions of trials of fixed (n, q)], has the form shown in Figure 4a,b, where we present histograms of the rate in population q in 100 trials of a given sequence length: q =, n = 4 (d = 3) (Fig. 4a) and q = 1, n = 5 (d = 5) (Fig. 4b). The rates are well separated into trials with rate >10 Hz (WM) and those with rate <5 Hz (spontaneous activity). The fraction of surviving images with given (n, q), out of the total number of 440 A Multiple-object Working Memory Model Amit et al.

Figure 4. Model network performance. (a, b) Sample rate distributions in WM across trials: histograms of average rates in a given cue population (number q) following 100 repetitions of trials of length n (hence fixed d). The rate is averaged over the last 100 ms of the delay period following the removal of the nth stimulus. (a) q =, n = 4 (d = 3); (b) q = 1, n = 5 (d = 5). In both cases there is a clear separation of WM rates (>10 Hz) from spontaneous rates. Note the shift of the statistics from high to low rates, on going from (a) to (b), increased d. The fraction of the distribution in the high-rate part of the histogram is defined as the performance level at the given d and n. (c, d) Performance levels in the model versus cue position q. (c) points connected are of equal trial length n (as in Fig. 1c); (d) same data, points connected, equal cue match separation d (as in Fig. 1d). trials corresponding to this pair is our estimate P of the performance level. This point of view is justified by the high performance level in the experiment, together with the assumption that information on images in WM is represented by multiple delay states (see Discussion). Figure 4c,d (same data) presents the performance of the model for the set of 8 (n, q) pairs studied in the experiment: Figure 4c corresponds to Figure 1c and Figure 4d to Figure 1d. The data, as well as the model s output, consist of the combined performance levels with 1 and 3 s delays. Performance of the model is better for short delays and this seems also to be the case for monkeys, except for low-n trials (data not shown). No systematic fitting of parameters was attempted. Beyond the similarity of the figures, the model predictions were found compatible with the data (Hotelling T test) at a confidence level of 95% (T = 0.46 with 546 degrees of freedom). Discussion The model presented here provides a rather complete account of the experimental results in the framework of MF theory, which is the dynamics of average population rates (hence, suppressed noise), with the above two elements added, one internal (finitesize noise) and one external (reward-expectation attention reduction of non-selective external afferents; see Methods). The network, embedding single items in a Hebbian way into its synaptic matrix, develops delay activity for each of the 16 images used in training. For sufficiently strong imprinting (synaptic potentiation) of the single item traces, multi-item WM emerges. In the case studied here, there was WM of up to six items, i.e. the network could sustain, simultaneously, enhanced delay activity for the unions of the cell groups in delay activity of as many items. This is true for every subset (of no more than six, for the parameters used) of the 16 images, in analogy with the spurious states of the Hopfield model (Hopfield, 198; Amit et al., 1985; Amit, 1989). What limits the span of WM is the concomitant rise in inhibition. Moreover, due to the introduction of NMDA receptors, as stimuli are presented in a sequence, the network passes into WM states of increasing multiplicity, rather than into the last presented, in contradistinction to the spurious states in the system of binary neurons. Similar observations, in a Cerebral Cortex May 003, V 13 N 5 441

Figure 5. Single cell recordings for recognizing a multiple-working-memory (MWM) cell from an IT cell and from a Miller Desimone PF cell. Schematic PST (rate) histogram of a cell exhibiting elevated delay activity to image-3, but not to image- or image-1. All panels represent the same cell. Left, a critical sequence (1-delay-3-delay-); right, a non-critical sequence (3-delay-1-delay-), leaving PF and MWM undifferentiated. Left (a c, from top down): the IT cell does not respond to image-1 and delay activity is at spontaneous rate; it strongly responds to image-3 followed by elevated delay activity; this delay activity is destroyed by image- (last). The PF cell: does not respond to image-1 followed by delay activity at spontaneous rate; it responds to image-3 but maintains the delay activity (spontaneous) corresponding to image-1 (first); image- does not change the delay activity. The MWM cell does not respond to image-1; has elevated delay activity for image-3. This delay activity persists after the presentation of image-. The delay activity is neither last (IT) nor first (PF). Right (d f): the rate histograms do not distinguish between a potential PF and MWM cell. somewhat different context, have been recently made by Tanaka (Tanaka, 00a,b), who has underlined the sensitivity of various network properties to the A MPA/NMDA ratio, as well as to inhibition. We defer the detailed examination of these effects in our context to when we confront the dynamics of the spiking network, rather than within MF theory. The fact that the network is made up of a finite number of neurons and, hence, that each neuron receives a limited number of pre-synaptic contacts (finite-size), together with the fact that spike emission is stochastic, introduces large f luctuations, especially in selective activity. When network activity expresses delay activity, of any multiplicity, these f luctuations cause, at random times, transitions of some populations in delay activity to spontaneous activity. This eliminates, sporadically, items from WM. The probability that an item s representation exits from WM is essentially independent of the item, but older items are exposed to more attempts at escape. Upon repeated presentation of a sequence, the older the item, the more likely it is to be lost to WM. An analogy is that of radioactive decay: though the disintegration probability of each atom is constant in time, older populations will be more depleted. When reward-expectation is introduced, simulated by a decrease in the non-selective ambient afferent on the module, delay activity becomes more stable with increasing sequence length. In this way, WM becomes increasingly stable with increasing multiplicity and decays become slower. Our position on the behavioral aspects of the model is that we identify recognition of a repetition when, upon presentation of the match, the cue is still in WM. The repetition, though, is never affected in the simulations and all images in the sequence are potential cues. The fact that monkeys perform well (up to 97% exact responses) implies that if the information on past images is present, coded in (multiple) delay activities, it can be used quite effectively. Since older items are more likely to have exited from WM earlier, performance decays with cue match separation and, since the stability of WM increases with the length of trial, performance is better at equal cue match separation as the pair moves down the sequence, i.e. for longer sequences. Note that effects of selective attention, novelty, primacy and similarity are not modeled. In fact, they do not seem to be observed in this type of task. They could be included in a more complete account, but here it is instructive to see how far the simpler account can go. [A preliminary analysis of false-positives in trials in which an image is in the sequence and those in which the same image provokes the false positive response shows very low correlation, implying little effect of image pair similarity. Details will be reported in the full account of the experiment.] The success of the model in reproducing the behavioral data, in rather realistic conditions, opens a bridge from behavior to physiology. In fact, single unit recording can distinguish neurons confirming the multi-item scenario, both on the level of neural spike dynamics as well as on the level of the correlation with behavioral response. Recognizing Multi-item WM in Single Cell Recordings Delay activity of IT neurons represents the last image shown (Yakovlev et al., 1998). In some situations, PF neurons maintain delay activity of the first stimulus only (Miller et al., 1996; Brunel and Wang, 001). Consider recording from a neuron that has delay activity for image-3 and no delay activity for either image-1 or image- (as one, for example, of column 3 in Fig. d). Consider a trial where images 1, 3 and are presented in that order, separated by a delay between the presentation of each two images (Fig. 5a c). Following image-1, there will be no visual response and the delay activity will be at spontaneous activity rate. If it is an IT-neuron, it will shift to elevated delay activity after the presentation of image-3 and will go back to spontaneous activity following image-, because this neuron expresses the delay activity for the last stimulus presented (Fig. 5a). A PF-neuron, of the Miller Desimone type, will respond to image-3 following the delay activity of image-1, but will have no elevated delay activity following image-3, because it preserves the delay activity corresponding only to the first stimulus presented in the task (Fig. 5b). If that neuron is a multi-item WM (MWM) neuron, it will have no elevated delay activity following the presentation of image-1; following image-3, it will go into elevated delay activity, as if it were of IT type. This would exclude the possibility of it being a Miller Desimone type PF-neuron, which would remain in spontaneous activity (the delay activity of this cell for the first stimulus). But upon presentation of image-, it will remain in the elevated delay activity of image-3 (Fig. 5c), rather than move to spontaneous activity (delay activity for image- for this neuron), as would the IT-neuron. This is a guide for physiology, in search of neural correlates, as well as a prediction to corroborate the model and its conceptual framework. As a contrast, we present the schematic evolution of the rates of the same neuron for a presentation of the sequence 3, 1,, which leaves the ambiguity PF MWM unresolved. (The same neurons, recorded during the presentation of sequence of 1,, 3, would leave the IT MWM ambiguity unresolved). The Neural Behavioral Correlate Given a neuron with selective delay activity, across images, if that image is presented in different trials, in various positions of the sequence, the response upon the repetition of this image should be correlated with the persistence of the neuron s delay 44 A Multiple-object Working Memory Model Amit et al.

activity until the presentation of the repetition. And, conversely, the absence of delay activity in this neuron should be correlated with negative responses no response when this image is repeated. This prediction may be affected by another correlation that of a correct response with the level (spike rate) of the specific delay activity [as in Goldman Rakic et al. (Goldman Rakic et al., 1990)] but factor analysis can disentangle the two effects. Supplementary Material Supplementary material can be found at: http://www.cercor. oupjournals.org Notes This study was supported by the Center of Excellence Grant Changing Your Mind of the Israel Science Foundation and the Center of Excellence Grant Statistical Mechanics and Complexity of the INFM, Roma-1. We thank M. Mascaro, G. Mongillo, S. Hochstein and Y. Amit for helpful discussions and comments and Harvinder Singh for assistance with the graphics. Address correspondence to Daniel J. Amit, Università degli studi di Roma La Sapienza, Dipartimento di Fisica, P.le A. Moro 1, 00185, Roma, Italy. Email: daniel.amit@roma1.infn.it. References Amit DJ (1989) Modeling brain function. New York: Cambridge University Press. Amit DJ (1995) The Hebbian paradigm reintegrated: local reverberations as internal representations. Behav Brain Sci 18:617 66. Amit DJ, Brunel N (1997a) Global spontaneous activity and local structured learned) delay period activity in cortex. Cereb Cortex 7:37 5. Amit DJ, Brunel N (1997b) Dynamics of a recurrent network of spiking neurons before and following learning. Network 8:373 404. Amit DJ, Gutfreund H, Sompolinsky H (1985) Spin-glass models of neural networks. Phys Rev A 3:1007 1018. Bernacchia A (001) Dinamica di una rete neuronale strutturata con recettori e attività selettiva multipla. Thesis of laurea, dip. di Fisica, Università di Roma la Sapienza [in Italian]. Brunel N, Sergi S (1998) Firing frequency of integrate-and-fire neurons with finite synaptic time constants. J Theor Biol 195:87 95. Brunel N, Wang XJ (001) Effects of neuromodulation in a cortical network model of object working memory dominated by recurrent inhibition. J Comput Neurosci 11:63 85. Compte A, Brunel N, Goldman-Rakic PS, Wang X-J (000) Synaptic mechanisms and network dynamics underlying spatial working memory in a cortical network model. Cereb Cortex 10:910 93. Funahashi S, Bruce CJ, Goldman-Rakic PS (1989) Mnemonic coding of visual space in the monkey s dorsolateral prefrontal cortex. J Neurophysiol 61:331 349. Fuster JM, Alexander G (1971) Neuron activity related to short-term memory. Science 173:65 654. Goldman-Rakic PS, Funahashi S, Bruce CJ (1990) Neocortical memory circuits. Cold Spring Harb Symp Quant Biol LV:105 1038. Hopfield JJ (198) Neural networks and physical systems with emergent selective computational abilities. Proc Natl Acad Sci USA 79: 554 558. Kubota K, Niki H (1971) Prefrontal cortical unit activity and delayed alternation performance in monkeys. J Neurophysiol 34:337 347. Lashley KS (1951) The problem of serial order in behavior. In: Cerebral mechanisms in behavior (Jeffress LA, ed.). New York: Wiley. Miller EK, Erickson CA, Desimone R (1996) Neural mechanisms of visual working memory in prefrontal cortex of the macaque. J Neurosci 16:5154 5167. Miyashita Y, Chang HS (1988) Neural correlate of pictorial short-term memory in the primate temporal cortex. Nature 331:68 70. Ricciardi LM (1977) Diffusion processes and related topics on biology. Berlin: Springer. Tanaka S (00a) Multi-directional representation of spatial working memory in a model prefrontal cortical circuit. Neurocomputing 44 46:1001 1008. Tanaka S (00b) Dopamine controls fundamental cognitive operations of multi-target spatial working memory. Neural Networks 15:573 58. Tuckwell CT (1988) Introduction to theoretical neurobiology, vol.. Cambridge: Cambridge University Press. Van Vreeswijk C, Sompolinsky H (1996) Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science 74:174 176. Wang XJ (1999) Synaptic basis of cortical persistent activity: the importance of NMDA receptors to working memory. J Neurosci 19:9587 9603. Yakovlev V, Fusi S, Berman E, Zohary E (1998) Inter-trial neuronal activity in inferior temporal cortex: a putative vehicle to generate long-term visual associations. Nat Neurosci 1:310 317. Yakovlev V, Orlov T, Zohary E, Hochstein S (000) Working memory for multiple stimuli in monkeys. Neurosci Lett Suppl 55:P7. Cerebral Cortex May 003, V 13 N 5 443