Representation of sound in the auditory nerve

Representation of sound in the auditory nerve Eric D. Young Department of Biomedical Engineering Johns Hopkins University Young, ED. Neural representation of spectral and temporal information in speech. Phil. Trans. R. Soc. B 363:923 945 (2008). The wiring diagram of the cochlea. There are both afferent (ear to brain) and efferent (brain to ear) fibers. type 1 afferents innervate IHCs type 2 afferents innervate OHCs efferents innervate OHCs (the medical olivocochlear bundle MOC) and the dendrites of type 1 afferents (LOC). (Discussed in a later lecture) 1

Two features of cochlear physiology important for auditory-nerve responses to sound: 1. Points on the basilar membrane (and therefore IHCs at those points) are tuned to frequency.! 2. Hair cells respond to the waveform of the stimulus at low frequencies, but the envelope of the signal at higher frequencies. Ruggero et al. 1995 Palmer and Russell, 1986 Auditory nerve (AN) fibers respond to acoustic stimuli with spike trains. The stimulus is represented by the strength of firing (i.e. the discharge rate) and by spike timing (phase locking). stimulus spike train time stimulus (expanded time scale) spike train (note phase locking of the spikes to the stimulus waveform) 2

Consistent with the basilar membrane characteristics, each AN fiber is tuned. The frequency of maximum sensitivity is the best frequency (BF), also called characteristic frequency (CF). There are fibers centered at all BFs across the animal s frequency range. The tuning curve is a plot of frequency (abscissa) versus the lowest sound pressure (ordinate) at which the fiber responds. BF Miller et al. 1997 The tuning curve is a direct measure of tuning based on threshold responses to tones. Another measure of tuning is based on reverse correlation (RevCor). 1. A broadband stimulus (ideally noise) is presented to the neuron and its spike trains recorded. 2. The noise waveforms preceding spikes are averaged, giving the RevCor function. (Note that RevCors can only be obtained if the neuron is phaselocked to the stimulus waveform). Noise stimulus Neural spike times Reverse correlation function 3

The RevCor function is the impulse response of the linear filter equivalent to the tuning of the auditory-nerve fiber. The Fourier transform of the RevCor is thus a filter function like the tuning curve. An advantage of RevCors is that they can be obtained at several different sound levels, not just at threshold. Tuning curves from RevCors at several noise sound levels from one fiber. Note that the tuning becomes broader at higher sound levels in the same way as for the basilar membrane. Recio-Spinoso et al. 2005 stimulus spike train time The spike rate changes when the stimulus is w/in the fibers tuning curve, the rate code. stimulus (expanded time scale) spike train The spike train is phaselocked to the stimulus waveform. 4

Basic properties of AN rate and phase-locked codes: 1. Rate increases monotonically with sound level. Fibers dynamic ranges differ between the spontaneous rate (SR) groups. 2. Phase locking is observed only at frequencies below a few khz. Note this includes the frequencies important in speech and music. (dashed lines are spont. rate) number of spikes rate = duration of sound S = 1 N N j =1 2 N sin(φ j ) + cos(φ j ) j =1 2 where j is the phase, relative to the stimulus, of the j th spike. S varies 0-1. Johnson & Kiang, 1976 High SR fibers are sensitive and respond to the stimulus at low sound levels, over the range where the basilar membrane is linear, giving narrow dynamic ranges (model unit 1). Low and medium SR fibers respond at higher levels to the compressive portion of the basilar membrane response, giving sloping saturation (model unit 2). Sachs et al. 1989 5

Basic properties of AN rate and phase-locked codes: 1. Rate increases monotonically with sound level. Fibers dynamic ranges differ between the spontaneous rate groups. 2. Phase locking is observed only at frequencies below a few khz. Note this includes the frequencies important in speech and music. S = 1 N N j =1 2 N sin(φ j ) + cos(φ j ) j =1 2 where φ j is the phase, relative to the stimulus, of the j th spike. S varies 0-1. Johnson & Kiang, 1976 Recall that natural sounds are complexes of many frequencies. The identity (quality) of a sound is determined by the spectrotemporal pattern of energy across frequency. basketball music (strings mostly) honk 6

The actual neural representation is a population code, in which fibers represent the components of the sound at frequencies near the fiber s BF. Response of a large population of AN fibers to the syllable /da/. The formants of the /da/ are shown by the red lines. Note the broad BF region over which responses are dominated by F1 and the narrower region dominated by F2. There is also an F3 response, hard to see. Shamma 1985 from Miller and Sachs 1983 7

Rate representation of the steady vowel / eh/ as in met There are two populations of auditory nerve fibers, distinguished by spontaneous discharge rate, threshold, and dynamic range. The low/med spontaneous rate population is 20% of the total. Note the saturation of the representation in the high spontaneous rate population. This problem is worse for more difficult listening conditions (e.g. background noise), which suggests that we may use additional cues. Sachs and Young 1978 Analysis of spike train responses. In sounds with many frequency components, phase-locking can be used to separate out the neural responses to different frequency components. Time domain Frequency domain Stimulus 0 1 2 3 Spike train Discharge rate 1000 100 the response distortion products, due to neural rectification 0 0 10 20 30 40 50 Time, milliseconds 0 0 1 2 3 Frequency, khz 8

Population responses to two tones. Note: 1. Wide distribution of responses along the BM (expected from the spread of tuning curves at high sound levels). 2. Two tone suppression of responses to one tone by the other (*). 3. Generation of combination tones at frequencies f 2 -f 1 and 2f 1 -f 2. Even though there is no energy in the stimulus at these frequencies, they distribute along the BM like real tones. Presumably energy at these frequencies is generated by BM nonlinearity. * 2.17 and 2.79 khz at 65 db SPL. Kim et al. 1975 When the rate representation saturates, there is still information in the temporal phase-locked response of the neurons. The frequency to which fibers are phase locked (ordinate) varies strongly with place on the basilar membrane (abscissa). Maximum response is mostly to the stimulus frequency nearest the BF of the fiber. Stimulus frequency to which the fiber responds Strength of the response Where on the basilar membrane Miller et al. 1997 9

A rate sensitive neuron, perhaps with some lateral inhibition to sharpen the selectivity A phase-locking sensitive neuron (implausible because of the very short times required) A relative phase sensitive neuron. Each neuron receives multiple inputs and is sensitive to the coincidence of its inputs. Such sensitivity is possible and is observed for binaural sounds in the superior olive. It has not yet been demonstrated for monaural sounds, but the rate representation is better at the output of the cochlear nucleus than in the auditory nerve. In the CNS, phase-locking is less important, but timing codes still exist. The data below are responses of two neurons in the inferior colliculus to sounds simulating different spatial source locations. Possible neural codes: 1. discharge rate 2. latency of the first spike 3. temporal spiking patterns. Note that these are quite different. Chase and Young 2006 10

Considering spike timing patterns increases the information in spike trains over that due to rate alone. The extra information encoded in spike timing. Chase and Young 2008 11