ELECTROPHYSIOLOGY OF UNIMODAL AND AUDIOVISUAL SPEECH PERCEPTION

Similar documents
Language Speech. Speech is the preferred modality for language.

Title of Thesis. Study on Audiovisual Integration in Young and Elderly Adults by Event-Related Potential

The Mismatch Negativity (MMN) and the McGurk Effect

Twenty subjects (11 females) participated in this study. None of the subjects had

Gick et al.: JASA Express Letters DOI: / Published Online 17 March 2008

ILLUSIONS AND ISSUES IN BIMODAL SPEECH PERCEPTION

The influence of selective attention to auditory and visual speech on the integration of audiovisual speech information

Auditory-Visual Speech Perception Laboratory

The time-course of intermodal binding between seeing and hearing affective information

Ear and Hearing (In press, 2005)

What do you notice? Woodman, Atten. Percept. Psychophys., 2010

THE ROLE OF VISUAL SPEECH CUES IN THE AUDITORY PERCEPTION OF SYNTHETIC STIMULI BY CHILDREN USING A COCHLEAR IMPLANT AND CHILDREN WITH NORMAL HEARING

Tilburg University. Published in: Journal of Cognitive Neuroscience. Publication date: Link to publication

Atypical processing of prosodic changes in natural speech stimuli in school-age children with Asperger syndrome

MULTI-CHANNEL COMMUNICATION

Event-Related Potentials Recorded during Human-Computer Interaction

Identification of the Mismatch Negativity in the Responses of Individual Listeners

Rhythm and Rate: Perception and Physiology HST November Jennifer Melcher

The overlap of neural selectivity between faces and words: evidences

Consonant Perception test

Event-related brain activity associated with auditory pattern processing

Effect of intensity increment on P300 amplitude

Open Access Neural Dynamics of Audiovisual Integration for Speech and Non-Speech Stimuli: A Psychophysical Study

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080

Behavioral and electrophysiological effects of task-irrelevant sound change: a new distraction paradigm

AccuScreen ABR Screener

Neural Correlates of Human Cognitive Function:

HCS 7367 Speech Perception

Activation of brain mechanisms of attention switching as a function of auditory frequency change

Perceptual congruency of audio-visual speech affects ventriloquism with bilateral visual stimuli

Multimodal interactions: visual-auditory

Temporal integration: intentional sound discrimination does not modulate stimulus-driven processes in auditory event synthesis

MODALITY, PERCEPTUAL ENCODING SPEED, AND TIME-COURSE OF PHONETIC INFORMATION

Seeing facial motion affects auditory processing in noise

Outline.! Neural representation of speech sounds. " Basic intro " Sounds and categories " How do we perceive sounds? " Is speech sounds special?

D. Debatisse, E. Fornari, E. Pralong, P. Maeder, H Foroglou, M.H Tetreault, J.G Villemure. NCH-UNN and Neuroradiology Dpt. CHUV Lausanne Switzerland

Perceived Audiovisual Simultaneity in Speech by Musicians and Non-musicians: Preliminary Behavioral and Event-Related Potential (ERP) Findings

Processing Interaural Cues in Sound Segregation by Young and Middle-Aged Brains DOI: /jaaa

An investigation of the auditory streaming effect using event-related brain potentials

The neural code for interaural time difference in human auditory cortex

Thomas D. Carrell Department of Communication Sciences, University of Nebraska, Lincoln, Nebraska 68538

C ritical Review: Do we see auditory system acclimatization with hearing instrument use, using electrophysiological measures?

The combined perception of emotion from voice and face de Gelder, Bea; Böcker, K.B.E.; Tuomainen, J.; Hensen, M.; Vroomen, Jean

MENTAL WORKLOAD AS A FUNCTION OF TRAFFIC DENSITY: COMPARISON OF PHYSIOLOGICAL, BEHAVIORAL, AND SUBJECTIVE INDICES

Activation of the auditory pre-attentive change detection system by tone repetitions with fast stimulation rate

The role of visual spatial attention in audiovisual speech perception q

Neuropsychologia 50 (2012) Contents lists available at SciVerse ScienceDirect. Neuropsychologia

Neural Correlates of Complex Tone Processing and Hemispheric Asymmetry

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

Cognitive resources in audiovisual speech perception

Analysis of in-vivo extracellular recordings. Ryan Morrill Bootcamp 9/10/2014

HST 583 fmri DATA ANALYSIS AND ACQUISITION

Linguistic Phonetics Fall 2005

Auditory scene analysis in humans: Implications for computational implementations.

EEG Analysis on Brain.fm (Focus)

SUPPLEMENTARY INFORMATION. Supplementary Figure 1

Audiovisual speech perception in children with autism spectrum disorders and typical controls

Studying the time course of sensory substitution mechanisms (CSAIL, 2014)

Figure 1. Source localization results for the No Go N2 component. (a) Dipole modeling

Electrophysiological Substrates of Auditory Temporal Assimilation Between Two Neighboring Time Intervals

Perceptual and cognitive task difficulty has differential effects on auditory distraction

How Can The Neural Encoding and Perception of Speech Be Improved?

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

HCS 7367 Speech Perception

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

Early posterior ERP components do not reflect the control of attentional shifts toward expected peripheral events

Morton-Style Factorial Coding of Color in Primary Visual Cortex

Design and ERP data: Basic concepts. Caitlin M. Hudac Graduate student, Developmental Brain Laboratory

Title change detection system in the visu

Organization of sequential sounds in auditory memory

Takwa Adly Gabr Assistant lecturer of Audiology

The EEG Analysis of Auditory Emotional Stimuli Perception in TBI Patients with Different SCG Score

Speech Reading Training and Audio-Visual Integration in a Child with Autism Spectrum Disorder

EEG reveals divergent paths for speech envelopes during selective attention

The Role of Large-Scale Memory Organization in the Mismatch Negativity Event-Related Brain Potential

Separate memory-related processing for auditory frequency and patterns

Modifying the Classic Peak Picking Technique Using a Fuzzy Multi Agent to Have an Accurate P300-based BCI

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Spectrograms (revisited)

Supporting Information

Independence of Visual Awareness from the Scope of Attention: an Electrophysiological Study

Timing & Schizophrenia. Deana Davalos Colorado State University

SP H 588C Electrophysiology of Perception and Cognition Spring 2018

Categorical Perception

The Verification of ABR Response by Using the Chirp Stimulus in Moderate Sensorineural Hearing Loss

RAPID COMMUNICATION Scalp-Recorded Optical Signals Make Sound Processing in the Auditory Cortex Visible

Processed by HBI: Russia/Switzerland/USA

Introduction 2/10/2012. C. G. Marx, Au. D., CCC A Edward Goshorn, Ph.D., CCC A/SLP Alaina Simmons, B.S.

International Journal of Neurology Research

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

ARTICLE IN PRESS. Spatiotemporal dynamics of audiovisual speech processing

An Overview of BMIs. Luca Rossini. Workshop on Brain Machine Interfaces for Space Applications

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Do P1 and N1 evoked by the ERP task reflect primary visual processing in Parkinson s disease?

FINE-TUNING THE AUDITORY SUBCORTEX Measuring processing dynamics along the auditory hierarchy. Christopher Slugocki (Widex ORCA) WAS 5.3.

Mental representation of number in different numerical forms

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

Cortical Auditory Evoked Potentials in Children with Developmental Dysphasia

Auditory-Visual Integration of Sine-Wave Speech. A Senior Honors Thesis

Fundamentals of Cognitive Psychology, 3e by Ronald T. Kellogg Chapter 2. Multiple Choice

Transcription:

AVSP 2 International Conference on Auditory-Visual Speech Processing ELECTROPHYSIOLOGY OF UNIMODAL AND AUDIOVISUAL SPEECH PERCEPTION Lynne E. Bernstein, Curtis W. Ponton 2, Edward T. Auer, Jr. House Ear Institute, 2 W. Third St., Los Angeles, CA 957. 2 Neuroscan Labs, 57 Cromo Dr. STE, El Paso, TX 7992. ABSTRACT Based on behavioral evidence, audiovisual speech perception is generally thought to proceed linearly from initial unimodal perceptual processing to integration of the unimodally processed information. We investigated unimodal versus audiovisual speech processing using electrical event-related potentials (ERPs) obtained from twelve adults. Nonsense syllable stimuli were presented in an oddball paradigm to evoke the mismatch negativity (MMN). Conditions were () audiovisual incongruent stimuli (visual /ga/ + auditory /ba/) versus congruent audiovisual stimuli (visual /ba/ + auditory /ba/), (2) visual-only stimuli from the audiovisual condition (/ga/ vs. /ba/), and (3) auditory-only stimuli (/ba/ vs. /da/). A visual-alone MMN was obtained on occipital and temporo-parietal electrodes, and the classical auditory MMN was obtained at the vertex electrode, Cz. Under audiovisual conditions, the negativity recorded at the occipital electrode locations was reduced in amplitude and latency compared to that recorded in the visual-only condition. Also, under the audiovisual condition, the vertex electrode showed a smaller negativity with increased latency relative to the auditory MMN. The neurophysiological evidence did not support a simple bottom-up linear flow from unimodal processing to audiovisual integration.. INTRODUCTION Speech perception is a process that transforms speech signals into the neural representations that are then projected onto word-form representations in the mental lexicon. Phonetic perception is more narrowly defined as the perceptual processing of the linguistically relevant attributes of the physical (measurable) speech signals. How acoustic and optical phonetic speech signals are processed under unimodal conditions and integrated under audiovisual conditions are fundamental questions for behavioral and brain sciences. The McGurk effect [] has been used as a tool in investigating this question behaviorally. An example of the McGurk effect is when a visible spoken token [ga] is presented synchronously with an audible [ba], frequently resulting in the reported percept /da/. Various experimental results have led to the McGurk effect being attributed to, and viewed as evidence for, early bottom-up perceptual integration of phonetic information. For example, selectively attending to one modality only does not abolish the effect [2], suggesting that cognitive top-down control is not possible. Gender incongruency and knowledge about phonemic incongruency between auditory versus visual stimuli does not abolish it [3, 4], suggesting a high degree of bottom-up automaticity. Auditory phonetic distinctions, for example voicing, are affected by visual syllables [5], and phonetic goodness judgments are affected by visual syllables [6], suggesting that integration is subphonemic. McGurk percepts do not adapt auditory stimuli [7], suggesting that audiovisual integration strictly follows auditory phonetic processing. These observations are consistent with the theory that there is a bottom-up flow of unimodal processing that precedes audiovisual integration. There are also results inconsistent with the strictly bottom-up flow of unimodal information followed by integration. For example, asynchrony of audiovisual stimuli of approximately 8 ms does not abolish McGurk effects [cf., 8, 9], suggesting that perceptual information can be maintained in memory and then integrated. Reductions in effect strength have also been shown to occur due to training [2] and to talker familiarity [], both of which may be related to post-perceptual processes. 2. THE CURRENT STUDY We investigated the processing of unimodal auditory and visual, and audiovisual speech stimuli using recordings of electrical event-related potentials (ERPs) obtained in an oddball, mismatch negativity (MMN) paradigm. ERPs afford neurophysiological measures of brain activity with high temporal resolution (< ms) and with moderately good spatial resolution (< mm). These measures of thalamic and cortical brain activity are presumed to reflect mostly excitatory post-synaptic potentials arising from large populations of pyramidal cells oriented in a common direction [24]. ERPs are often classified as exogenous (reflecting physical 4

AVSP 2 International Conference on Auditory-Visual Speech Processing stimulus characteristics, such as intensity) or endogenous (reflecting the state of the subject or the cognitive demands of the task; [5]). The cortical auditory ERP N is an exogenous, obligatory potential that occurs approximately ms after stimulus onset. It is preceded by the P, at approximately 4-5 ms, and followed by the P2, at approximately 2 ms after stimulus onset. The auditory N varies with physical stimulus characteristics and rate [5]. In an oddball presentation paradigm (e.g., the standard /ba/ repeated for 83% of trials and /da/ 7% pseudorandomly interspersed as the deviant), N is elicited by the standard and the deviant [e.g., 6], but the deviant can generate an additional negativity described as the mismatch negativity (MMN). The MMN is considered to be an automatic preattentive difference detection response correlated with behavioral discrimination (e.g., [6]). The MMN is isolated from the obligatory potentials by subtraction of the ERP to a stimulus playing the role of deviant from the ERP to the same stimulus playing the role of the standard. This controls for stimulus effects and gives the contrast, or change detection, effect only. In this study, MMN was defined as an added negative component evoked by an infrequently occurring deviant stimulus in a sequence of repeating standard stimuli, regardless of perceptual modality. In addition, if the magnitude of the difference between the ERP to the stimulus in its roles as standard versus deviant is sufficiently large, then a positive-going potential known as the P3 may be present and partially superimposed on the MMN. This can be seen in the response to the deviant in the top panel of Fig. for the auditory stimulus /ba/. Previously, Sams et al. obtained event-related magnetoencephalographic recordings in response to congruent and incongruent audiovisual speech stimuli, which they interpreted as evidence that visual phonetic information modulates responses to auditory stimuli by the auditory cortex []. This interpretation is consistent with non-linearity in bottom-up processing. Sams et al. did not obtain a visual-only response. We sought to replicate and extend their previous research. 3. METHODS Subjects. Twelve adults participated for pay in the experiment. All had normal hearing, normal or corrected-to-normal vision, English as a first language, and susceptibility to the McGurk effects, as verified with behavioral testing. Stimuli. The stimuli were spoken tokens of /ba/, /da/, and /ga/. They were digitally recorded so that the visual start of each token continued seamlessly from the end of the other tokens. The /ba/ and /da/ tokens were used in the auditory conditions. The /da/ stimulus was used because it is the typical percept obtained with the McGurk stimulus that comprises visual /ga/ and auditory /ba/. The visual stimuli were /ba/ and /ga/. The audiovisual stimuli were congruent audiovisual /ba-ba/, and incongruent auditory /ba/ combined with visual /ga/ ( /ba-ga/ ). Thus /ba/ was constant during audiovisual trials. The acoustic stimuli were 5 ms in duration. The video tokens were 2 frames (666 ms). The visual stimuli diverged from each other approximately 2-3 frames from their onset. The acoustic tokens were aligned at their onset with either the original acoustic /ba/ burst (the congruent stimuli) or the original acoustic /ga/ burst (the incongruent stimuli), approximately half-way through Frame 7. The visual tokens therefore showed speech movement prior to the acoustic onset, as is normally the case. ERP recording. A Neuroscan, evoked potential recording system was used with electrode positions: C, C2, C3, C4, C5, C6, Cz, F3, F4, F5, F6, F7, F8, Fz, T3, T4, T5,, P3,, P5,, PZ, O,,, M, M2, Fp, and Fp2. Ocular movements were monitored on two differential recording channels. A signal controlling stimulus presentation triggered sampling of the EEG. The EEG was anti-aliased, sampled at. khz, and bandpass-filtered from either DC or Hz to 3 Hz. Individual EEG epochs were recorded and stored to computer hard-disk for off-line processing. Off-line, single sweeps were baseline-corrected on the pre-stimulus interval. Using an automatic rejection algorithm, sweeps containing activity in any channel that exceeded ±2 were excluded from subsequent averaging. A regression-based correction for eye movements was applied to the accepted data. The -ms point of the ERP recordings was set at the onset of the acoustic stimuli. Sampling preceded the -ms point by ms and followed for 6 ms. MMN procedure. Each stimulus was tested as both a standard and a deviant. Standards were presented in 87% of trials pseudorandomly ordered with 3% of deviant trials. For example, auditory /ba/ occurred as a standard versus auditory /da/ as a deviant, and vice versa in another run. Thus there were six runs, two for each condition. 22 trials were presented per subject per condition. During the auditory-only stimulus presentations, subjects watched a video of a movie (entertainment not the visual syllable stimuli) with the sound off. Visual stimuli were viewed at a distance of.9 m from the screen. Testing took approximately 4.5 hours per subject. 5

AVSP 2 International Conference on Auditory-Visual Speech Processing Analyses. Grand mean waveforms were computed separately for the standard stimulus and the deviant stimulus in each condition, following initial response processing. The MMN was obtained by subtracting the standard response from the deviant response within stimulus and condition. An integrated [7] was derived from MMN difference waveforms and is shown in all of the figures below. 5. 4. RESULTS Fig. shows the auditory at the central electrode Cz, which was in the range of typical auditory latencies (9 m for /da/ and 22 m for /ba/). Fig. also shows the corresponding Cz responses for the other conditions. An was present at Cz for the visual /ga/ condition. Only a small was observed at Cz for the visual /ba/ and for audiovisual stimuli. Visual and audiovisual conditions resulted in longer latencies than did the auditory condition. Fig. 2 shows ERPs evoked by the standard and deviant visual stimuli. Electrode sites in Fig. 2 were selected for display, because they best demonstrated the visual MMN. Beneath each waveform plot is shown the corresponding waveforms. The visual /ba/ condition generated a robust MMN at the occipital electrodes. This negativity was strongly right-lateralized at the temporal and parietal electrodes. The visual /ga/ condition generated an added negativity largest at occipital electrodes, without an easily definable peak latency, except perhaps at electrode O. Fig. 3 shows the results for the audiovisual congruent /ba-ba/ and the incongruent /ba-ga/. The activity for congruent /ba-ba/ was smaller in magnitude (-.2 vs. -.5 ) and earlier in latency (4 vs. 57 m) relative to the responses measured at occipital electrodes for visual /ba/ in Fig. 2. The /ba-ga/ stimulus resulted in an MMN-like response on the and electrodes, with peak negativity at 22 m. The responses at the electrodes shown in Fig. 3 demonstrate that when a constant auditory /ba/ was presented, the responses at the occipital and temporo-parietal electrodes were altered relative to the responses in the visual-alone conditions. SUMMARY AND CONCLUSIONS These neurophysiological results revealed a more complex pattern of cortical activation than could be described as merely linearly ordered unimodal processing followed by audiovisual integration. The audiovisual /ba-ba/ on occipital electrodes was earlier and diminished in amplitude under conditions of the constant auditory /ba/, in comparison with the visual-only /ba/ response. The audiovisual response at vertex electrode Cz was later and smaller than the auditory-only. Analyses that space here did not permit discussing showed additional distinctly different response patterns across congruent versus incongruent audiovisual stimuli. Figure : Waveforms for classical MMN to auditory /ba/ on electrode Cz (top panel). at Cz for each of the stimuli, bottom six panels. m m m m (m) m. - -4-8 -4-8 -4-8 -4-8 -4-8 -4-8 AV /ba-ga/ AV /ba-ba/ Visual /ba/ Audio /ba/ Audio /ba/ Visual /ga/ min at 2 Audio /da/ min at 9 min at 74 min at 5 min at.38 min at.39 Standard /ba/ Deviant /ba/ 6. ACKNOWLEDGMENTS The authors gratefully acknowledge the contributions made by Betty Kwong in collecting the electrophysiology data, and Sven Mattys, Ph.D., in assisting in the development of the stimuli and screening the subjects. Research supported by the NSF (999688). 7. REFERENCES. McGurk, H. and MacDonald, J. A visible production of a consonant /g/, presented simultaneously with a heard /b/, resulted in the percept /d/, Nature 264: 746-748, 976. 6

AVSP 2 International Conference on Auditory-Visual Speech Processing 2. Massaro, D. W. Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Erlbaum Hillsdale, NJ, 987. 3. Green, K. P. et al. Integrating speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect, Percept. Psychophys. 5: 524-536, 99. 4. Summerfield, Q. and McGrath, M. Detection and resolution of audio-visual incompatibility in the perception of vowels, Quart. J. Exp. Psych. 36A: 5-74, 984. 5. Green, K. P. and Kuhl, P. K. The role of visual information in the processing of place and manner features in speech perception, Percept. Psychophys. 45: 34-42, 989. 6. Brancazio, L., Miller, J. L., and Pare, M. A. Visual influences on internal structure of phonetic categories, J. Acoust. Soc. Am. 8: 248, 2. 7. Saldana, H. M., and Rosenblum, L. D. Selective adaptation in speech perception using a compelling audiovisual adaptor, J. Acoust. Soc. Am. 95: 3658-366, 994. 8. Massaro, D. W., Cohen, M. M., and Smeele, P. M. T. Perception of asynchronous and conflicting visual and auditory speech, J. Acoust. Soc. Am. : 777786, 996. 9. Munhall, K. G., Gribble, P., Sacco, L., and Ward, M. Temporal constraints on the McGurk effect, Percept. & Psychophys. 58: 35-362, 996.. Walker S., Bruce, V., and O Malley, C. Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect, Percept. Psychophys. 57: 2433, 995.. Sams, M., Aulanko, R. Hamalainen M. et al. Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex. Neurosci. Let. 27: 445, 99. 2. Creutzfeldt, O. D., Watanabe, S., and Lux, H. D. Relations between EEG phenomena and potentials of single cortical cells. I. Evoked responses after thalamic and epicortical stimulation. EEG Clin. Neurophys. 2: 8, 966. 3. Miztdorf, U. The physiological causes of VEP: Current source density analysis of electrically and visually evoked potentials, in Evoked Potentials (Eds, R. Q. Cracco, I. Bodis-Wollner), Alan R. Liss Inc, NY, 986. 4. Vaughan, H., and Arezzo, J. The neural basis of event-related potentials, in Human event-related potentials (Ed. T. W. Picton), Elsevier, Amsterdam, 988. 5. Näätänen R. and Picton, T. W. The N wave of the human electric and magnetic response to sound: review and analysis of the component structure, Psychophys. 24: 375-425, 987. 6. Näätänen R. The mismatch negativity: A powerful tool for cognitive neuroscience, Ear Hear. 6: 68, 995. 7. Ponton, C. W., Don, M., Eggermont, J. J., and Kwong, B. Integrated mismatch (): A noise free representation that allows distribution free single-point statistical tests, EEG Clin. Neurophys. 4: 435, 997. 7

AVSP 2 International Conference on Auditory-Visual Speech Processing Figure 2. Visual responses to /ba/ (left column) and /ga/ (right column). Top two rows show occipital electrode sites. Bottom two rows show temporo-parietal sites. Visual /ba/ Occipital Electrodes Visual /ga/ Stand. O,, Dev. O,, m -4-6 O m -4-6 O -8-8 Temporo-Parietal Electrodes Standard,, Deviant,, m -4-6 m -4-6 -8-8 8

AVSP 2 International Conference on Auditory-Visual Speech Processing Figure 3. Audiovisual responses to /ba-ba/ (left column) and /ba-ga/ (right column). Top two rows show occipital electrode sites. Bottom two rows show temporo-parietal sites. Audiovisual /ba-ba/ Audiovisual /ba-ga/ Stand. O,, Dev. O,, m -4-6 O m -4-6 O -8-8 Temporo-Paretal Electrodes Standard,, Deviant,, m -4-6 m -4-6 -8-8 9