Robust Neural Encoding of Speech in Human Auditory Cortex

Robust Neural Encoding of Speech in Human Auditory Cortex Nai Ding, Jonathan Z. Simon Electrical Engineering / Biology University of Maryland, College Park

Auditory Processing in Natural Scenes How is the stable perception of sound generated from degraded acoustics?

Auditory Processing in Natural Scenes How is the stable perception of sound generated from degraded acoustics? Magnetoencephalography (MEG) MEG measures spatially synchronized dendritic current.

Outline Cortical Encoding of Speech in MEG Representation of Spectro-temporal Features Cortical Code despite Energetic Masking Speech in Stationary Noise Cortical Code despite Informational Masking Segregation of Simultaneous Speakers

MEG Response to Speech Speech Stimulus MEG Response frequency time time

MEG Response to Speech Speech Stimulus STRF MEG Response frequency time time correlation 0.2 0.1 0 Predictive Power Large-scale synchronized cortical activity is phase locked to slow temporal modulations of speech. 1 4 10 25 50 frequency (Hz) Ding & Simon (in press) J. Neurophysiol.

Neural Reconstruction The temporal envelope of speech can be reconstructed from the MEG response. stimulus speech envelope speech envelope reconstructed from MEG response 2 seconds Subject: R1747

Outline Cortical Encoding of Speech in MEG Representation of Spectro-temporal Features Neural Coding under Energetic Masking Speech in Stationary Noise Neural Coding under Informational Masking Segregation of Simultaneous Speakers

Speech Embedded in Noise Clean Speech SNR: 6 db SNR: -2 db SNR: -9 db Spectrogram Intelligibility: 100 % 70 % 5 % Envelope 6 db 1 second 10 participants; 2 minutes of stimulus in each condition

Neural Reconstruction of Speech The temporal envelope of the underlying speech is reconstructed neurally from cortical response. +6 db Reconstruction Accuracy 1 s -6 db Neural Reconstruction Speech Envelope Correlation.2.1 0 C +6 +2 SNR

Contrast Gain Control Neural compensation for noise-induced loss of stimulus contrast Amplitude-Intensity Function Amplitude Growth Rate response 30 db stimulus 12 db C +6 +2 SNR

Adaptive Encoding of Modulations power Modulation Spectrum of Stimulus 0 18 db 5 10 15 0 5 10 15 frequency (Hz) Noise noisier Speech coherence Response Spectrum Noise contains more energy at higher.2 modulation rate, and therefore interfere with speech more at.1 high modulation rates. frequency (Hz)

Adaptive Encoding of Modulations power Neural sensitivity profile shifts away from the modulation rates heavily corrupted by noise. Modulation Spectrum of Stimulus 18 db noisier coherence.2.1 Response Spectrum Cutoff Frequency Hz 9 8 7 6 C +6 +2 SNR 0 5 10 15 0 5 10 15 frequency (Hz) frequency (Hz)

Diotic Speech Segregation Two speakers, one male and one female, were mixed and presented diotically. The subjects were instructed to focus on one or the other speaker. The MEG response is modeled using two STRFs, one for each speaker. speech mixture Stream 1 Stream 2 MEG signal

Neural Unmixing of Concurrent Speakers frequency (khz) frequency (khz) 3 1.5.2 3 1.5.2 + Attended 0 100 200 Unattended 0 100 200 time (ms) Neurally decoded envelope is more correlated with the attended speaker in >90% of single trials. Correlation 0.2 0.1 0 Attended P << 0.001 Unattended

Summary 1. Neural processing adapts to noise. 2. Simultaneous speakers can be neurally segregated and processed differently. 3. Cortical encoding is precise yet dynamic: modulated by both stimulus acoustics (bottom-up) and attention (top-down), and leading to a robust encoding of speech in natural scenes.

Acknowledgement We thank Stephen David, David Poeppel, Mary Howard, Shihab Shamma, and Monita Chatterjee for discussions! SfN poster: 172.11/KK6 (Sunday, 10-11) Contact: gahding@umd.edu Nai Ding jzsimon@umd.edu Jonathan Z. Simon

Thank you!

Adaptive Encoding of Modulations Neural sensitivity profile shifts away from the modulation rates heavily corrupted by noise. Modulation Spectrum of Stimulus Response Spectrum 18 db noisier coherence.2.1 0 5 10 15 0 5 10 15 frequency (Hz) frequency (Hz)

STRF from MEG and LFP Frequency (khz) 3.3 1.3 0.5 0.2 MEG STRF 0 0.1 0.2 0.3 Time (s) f (khz) 5 1.1 LFP from ferret AI 0 0.1 0.2 0.3 LFP 0 0.1 0.2 0.3 time (s) (in collaboration with Stephen David and Shihab Shamma)