Supplementary Figure 1. Recording sites.

Supplementary Figure 1 Recording sites. (a, b) Schematic of recording locations for mice used in the variable-reward task (a, n = 5) and the variable-expectation task (b, n = 5). RN, red nucleus. SNc, substantia nigra pars compacta. SNr, substantia nigra pars reticulata.

Supplementary Figure 2 VTA neurons cluster into three response types. (a) Responses of all neurons recorded in the variable-reward task (n = 170). Each row reflects the auroc values for a single neuron in the second before and after delivery of expected reward. Baseline is taken as one second before odor onset. Yellow, increase from baseline; cyan, decrease from baseline. Light-identified neurons are denoted by an * to the left of each row. (b) The first three principal components of the auroc curves. These values were used for unsupervised hierarchical clustering, as shown in the dendrogram on the right. (c) Average firing rates for the three clusters of neurons. Orange, unexpected reward trials. Black, expected reward trials. (d-f) Same conventions as a-c, except for neurons recorded in the variable-expectation task. All 31 light-identified dopamine

neurons were classified as Type 1.

Supplementary Figure 3 Light identification of dopamine neurons. (a) Raw signal from one example light-identified dopamine neuron in the variable-reward task. Blue bars, light pulses. (b) For the same neuron, mean waveforms for spontaneous (black) and lightevoked (blue) action potentials. (c) For the same neuron, raster plots for 20 Hz (left) and 50 Hz (right) laser stimulation. Each row is one trial of laser stimulation (10 pulses of laser). (d) Histogram of log P values for each neuron recorded in the variable-reward task (n = 170). The P values were derived

from SALT (see Methods). Neurons with P < 0.001 and waveform correlations > 0.9 were considered identified (filled bars). (e, f) For light-identified neurons, probability of spiking (e) and latency to first spike (f) after laser pulses at different frequencies. Orange circles, mean across neurons. (g) Histogram of mean latencies (left) and latency standard deviations (right) in response to laser stimulation for all light-identified dopamine neurons in the variable-reward task. (h-n) Same conventions as a-g, but for neurons recorded in the variable-expectation task (n = 106).

Supplementary Figure 4 Putative and identified dopamine neurons respond similarly on the variable-reward task. (a) Average putative dopamine neuron responses (mean s.e.m.) for different sizes of unexpected (orange circle) and expected (black circle) reward. Orange line, best-fit Hill function for unexpected reward. Black line, subtractive shift of the orange line. n = 84 neurons. (b) Response to unexpected 2.5 L reward versus effect of expectation for this reward size. Line, best-fit linear regression. Grey dots, putative dopamine neurons. Blue dots, light-identified dopamine neurons. Pearson s correlation across all neurons, P = 1 x 10-10. R, correlation coefficient. (c) Baseline firing rates versus effect of expectation (averaged across reward sizes). P = 0.01. (d) Difference between reward-predicting odor and nothing-predicting odor versus difference between unexpected reward and expected reward. P = 3 x 10-6.

Supplementary Figure 5 Subtraction is scaled for each reward size. (a-f) For identified dopamine neurons in the variable-reward experiment (n = 40), response to unexpected reward versus effect of expectation for 0.1 L (a, P = 1.1 x 10-10 ), 0.3 L (b, P = 4.4 x 10-9 ), 1.2 L (c, P = 3.4 x 10-9 ), 5 L (d, P = 1.9 x 10-8 ), 10 L (e, P = 5 x 10-5 ), and 20 L (f, P = 8.1 x 10-5 ) reward. R, correlation coefficient.

Supplementary Figure 6 Noise correlations for pairs of putative dopamine and GABA neurons. (a, b) Noise correlations (mean s.e.m.) between pairs of simultaneously-recorded putative dopamine (Type 1) and GABA (Type 2) neurons in the variable-reward experiment (a, n = 59 pairs) and the variable-expectation experiment (b, n = 44 pairs). Correlations were calculated by examining trial-by-trial variations in spiking during different task epochs (see Methods). Grey bars, correlations on simultaneous trials. Black bars, correlations in which one neuron s data was shifted by one trial. (c, d) Histograms of noise correlations between pairs of simultaneously-recorded putative dopamine and GABA neurons. Data are combined from both the variable-reward and variable-expectation experiments, and reflect correlations during the reward-predicting cue (c) and during delivery of expected reward (d). Filled bars, significant noise correlation (P < 0.05, Pearson s correlation). Empty bars, n.s. Dotted lines, mean noise correlation.

SUPPLEMENTARY NOTES Mathematical description of dopamine responses Our results allow us to compactly describe dopamine responses to expected and unexpected reward. In our variable-reward experiment (Fig. 1), we found that every neuron responded to unexpected rewards using the same function, scaled up or down. This effect can be formalized in the following equation, where UU ii (rr) is the firing rate of the ith dopamine neuron to unexpected reward r, UU (rr) is the average firing rate to unexpected reward of size r, and αα ii is the factor that scales each neuron to the average. UU ii (rr) = αα ii UU (rr) (1) Next, we found that dopamine neurons performed subtraction (Fig. 2a), such that a given level of expectation reduced dopamine firing by the same absolute amount, regardless of reward size. We write this in the following way: EE ii (rr) = UU ii (rr) ee ii (2) Here, EE ii (rr) is the firing rate of the ith dopamine neuron to expected reward r and e i is the constant subtracted by that neuron when reward is expected. In Fig. 2b, we showed that the effect of expectation is proportional to a neuron s reward response: UU ii (rr) EE ii (rr) = ββ UU ii (rr) (3) Here, ββ is the slope in Fig. 2b. The slopes in Fig. 3 and Fig. 4d are 1 ββ. 1

By solving for ββ and combining equations 1-3, we obtain: ββ = ee ii UU ii (rr) = ee ii αα ii UU (rr) (4) This shows that β depends on both r and ii. But we know that ββ does not depend on ii; it is the same for every neuron (Fig. 2b). Therefore, ee ii must be proportional to αα ii : ee ii = ss αα ii (5) Combining equations 1, 2, and 5 reveals that: EE ii (rr) = αα ii UU (rr) ss αα ii = αα ii (UU (rr) ss) (6) Here, s can be seen as the average effect of expectation, i.e., the number of spikes that the average dopamine neuron subtracts when reward is expected. Thus, equation 5 implies that for any given neuron, the effect of expectation is simply αα ii ss. In other words, the same proportion (αα ii ) determines both an individual neuron s reward response and its suppression by expectation. Returning to β, we can combine equations 4 and 5 to show that: ββ = ss UU (rr) (7) β is thus orthogonal to αα ii. Rather than varying from neuron to neuron, β is the same for each neuron. It is determined by two factors: the amount of expectation (s, see Fig. 4) and the amount of reward (r, see Fig. 3). 2

Cue responses also scale with responsiveness To develop our model for prediction error coding, we have focused mostly on the reward response, comparing trials in which reward is expected to trials in which reward is unexpected. However, we also analyzed dopamine neuron responses to conditioned stimuli. In our variable-reward experiment, dopamine neurons responded more to an odor associated with reward than to an odor associated with nothing (2.31 ± 0.35 spikes s -1 vs. -0.38 ± 0.16 spikes s -1, mean ± s.e.m.; P = 3.1x10-7, t-test). Similarly, in the variable-expectation experiment, dopamine neurons responded most to the cue associated with 90 percent reward (5.38 ± 0.61 spikes s -1 ), less to the 50 percent cue (2.18 ± 0.43 spikes s -1 ), and least to the 10 percent cue (0.10 ± 0.25 spikes s -1 ; P = 6.5x10-5, 1.7x10-8, and 4.0x10-9, t-test for all pairs). These results are consistent with previous findings that dopamine neurons are finely tuned to stimuli associated with reward, in addition to reward itself 27,29. Interestingly, dopamine cue responses appeared to follow the scaled system that we described above, although the correlations were not significant for cues predicting small reward. In general, those neurons with large cue responses also showed large reward responses (Supplementary Fig 7: for the variable-reward experiment, P = 0.096; for the variable-expectation experiment, P = 0.001, 7.4x10-4, and 0.24 for the 90%, 50%, and 10% cues, respectively). Thus, prediction errors were scaled both for strong conditioned stimuli and for unconditioned stimuli, consistent with the notion that each neuron broadcasts the same signal. 3

Responses to aversive events Some of the dopamine neurons we recorded showed weak responses to our stimuli although they shared the same response function, their absolute firing rates were low. We wondered if these neurons might be specialized for other stimuli, for example, aversive stimuli. Therefore, we examined the correlation between reward responses and responses to three presumably aversive events: omission of expected reward (Supplementary Fig. 8a,b), prediction of an airpuff to the face (Supplementary Fig. 8c,d), and airpuff itself (Supplementary Fig. 8c,e). Note that for the airpuff cue and airpuff itself, there was a biphasic response consisting of an initial excitatory response followed by a dip. The excitation may be due to response generalization, since the sound of the airpuff valve resembled the sound of the reward valve 12. Here we focus on the inhibitory phase, which encoded a more genuine value response. We found that neurons that were highly responsive to unexpected rewards tended to be highly responsive to aversive events as well: they showed greater levels of suppression below baseline. This tendency is consistent with a recent study of putative dopamine neurons in monkeys that also used reward omission and airpuff, as well as bitter liquid 20. Thus, it is not the case that some neurons specialized for aversive events; rather, highlyresponsive neurons in one setting appeared to be highly-responsive in other settings as well. Conversely, weakly reward-responsive units were likely to have weak suppressions or even net-positive reactions to airpuff cues and airpuff itself. The latter neurons, which responded positively to both rewarding and punishing events, resemble the salience-coding neurons previously described 21, although their lack of excitation to reward omission (Supplementary Fig. 8b) may argue against this interpretation 20. 4