THE ROLE OF VISUAL SPEECH CUES IN THE AUDITORY PERCEPTION OF SYNTHETIC STIMULI BY CHILDREN USING A COCHLEAR IMPLANT AND CHILDREN WITH NORMAL HEARING Vanessa Surowiecki 1, vid Grayden 1, Richard Dowell 2, Graeme Clark 1, 2 & Paul Maruff 3 1 The Bionic Ear Institute, East Melbourne 2 Dept. of Otolaryngology, The University of Melbourne 3 Dept. of Psychology, LaTrobe University ABSTRACT: Debate ensues as to the benefits of visual speech cues on auditory perceptions in children using a cochlear implant. Ten children using a cochlear implant were matched by age, gender and Performance IQ to children with normal hearing. Participants responded as to whether they perceived /ba/, /da/ or /ga/ when presented with nine synthetic voiced plosives presented in isolation or in synchrony with visual articulations of /ba/ and /ga/. Comparison between the two test conditions revealed that the children s auditory perceptions were influenced by visual speech cues. Visual /ga/ paired with the auditory stimuli resulted in fewer auditory /ba/ percepts and increased auditory /da/ percepts for both groups. Congruent visual /ga/ cues did not improve auditory perception of /ga/ stimuli. Visual /ba/ presented in synchrony with the nine stimuli resulted in increased /ba/ percepts for stimuli 3 9 and fewer /da/ and /ga/ percepts. Auditory perception of /ba/ was improved under the congruent visual /ba/ condition. The implanted group s apparent response bias towards visual /ba/ articulations for /da/ and /ga/ auditory stimuli could be attributed to the evident uncertainty in which they labelled these stimuli in the auditory alone condition. These results suggest that all children, in particular those using a cochlear implant are likely to benefit from seeing a talker s mouth when listening in situations where the auditory signal is ambiguous. INTRODUCTION Following cochlear implantation, hearing-impaired children are provided with a range of habilitation services aimed at developing their auditory perceptual skills. Habilitation specialists disagree as to whether visual speech cues aid or hinder the perception of auditory speech sounds. This paper examines the influence of visual speech cues on implanted children s auditory perceptions of synthetic voiced plosives, comparing their performance to children with normal hearing. Prior to implantation, children with a profound hearing loss rely on visual speech cues to supplement the limited auditory stimuli they receive via hearing aids. This has led some researchers to suggest that individuals with hearing loss may develop a response bias towards visual speech cues (Walden, Montgomery, et al., 199). Advocates of Auditory-Verbal therapy propose that a bias towards visual speech cues will detract from the child s ability to utilise the new speech sounds presented via the implant (Vaughan, 1981). Auditory-Verbal therapists recommend parents do not face their child when speaking, minimising the child s use of visual speech cues (Beebe, Pearson, & Koch, 1984). However, regardless of hearing status all individuals use visual speech cues to complement auditory processing (Green & Kuhl, 1989; McGurk & MacDonald, 1976). In particular, visual articulations are known to increase auditory perception of speech when the auditory signal is degraded, when there is a poor signal to noise ratio or when the individual is listening monaurally (Arnsten, 1998; Erber, 1969; Grant, Ardell, et al., 1985; Pichora-Fuller, 1996; Sumby & Pollack, 1954). Despite the limited visual cues for many articulations, individuals perceive sounds more consistently when presented with congruent visual speech cues. Parents must choose which communication style they believe will assist their child to acquire auditory perceptual skills. This major decision impacts on the skills the child will possess throughout their education and will influence their prospects as an adult. To determine the influence of visual speech cues on the auditory perceptions of children using a cochlear implant, visual articulations were paired with a continuum of synthetic voiced plosives. Using a combination of a categorical perception task and the McGurk test (McGurk & MacDonald, 1976), the study paired nine auditory stimuli with visual articulations of /ba/ and /ga/ (Walden, Montgomery, et al., 199). Transition feature cues were provided in the /ba-da-ga/ continuum as the temporal aspects of the cues pair well with temporal changes in visual articulations (Green & Norrix, 1997). Previous reports indicate that the influence of visual articulations on auditory perceptions may relate to listening experience (Hockley & Polka, 1994; Massaro, 1984; McGurk & MacDonald, 1976). Children with Melbourne, December 2 to 5, 2. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 433
normal hearing were included as a comparison group to determine whether the implanted children s responses related to chronological age. Specifically, the study anticipated the following; The auditory percepts of both groups of children will change when presented in synchrony with visual articulations. Auditory perceptions of speech sounds will improve when presented in synchrony with congruent visual articulations. METHOD Subjects Ten children using a Nucleus multiple channel cochlear implant were matched by age, gender and Performance IQ (Wechsler, 1991) to children with normal hearing. Implanted children were identified through the Royal Victorian Eye and Ear Hospitals Cochlear Implant Clinics patients list whereas control participants were identified through state and Catholic primary schools. Tested with a screening audiogram, the control group possessed pure tone averages at db HL or better at the frequencies of 5Hz, 1kHz and 2kHz. The sample mean age was 8 years, 6 months (SD = 1 year, 4 months) and the mean Performance IQ for the group was 11.8 (SD = 8.95). The mean age at cochlear implantation was 4 years, 2 months (SD = 2 years, 4 months) and the children had on average 4 years, 4 months (SD = 2 years, 5 months) experience listening with the device. Stimulus Materials The synthetic stimuli were created using values obtained from natural recordings of a female Australian speaker. Nine stimuli were created varying in spectral transition cues along a /ba-da-ga/ continuum. The fundamental frequency began at 4Hz and decreased linearly to 156Hz across the 65ms. The first formant onset was at 3Hz for the consonant portion of the stimuli and voice onset was set at ms. The second formant began at 1.2kHz for stimulus 1, increased in 25Hz steps to stimulus 5 then increased in Hz steps to stimulus 9. The third formant value for stimulus 1 was 2.7kHz, increasing in 75Hz steps to 3kHz for stimulus 5 then decreasing in 75Hz steps to 2.7kHz for stimulus 9. Stimuli 2, 5 and 8 were the exemplar tokens for /ba/, /da/ and /ga/ respectively. Values for /a/ were 1kHz for F1, 1.5kHz for F2, 2.7kHz for F3. The fourth formant was kept at 4kHz for the entire syllable duration. The stimuli set were rms normalised. Digital video recordings were made of a female articulating the sounds /ba/, /da/ and /ga/. The female was shown from the neck to under her eyes. Audio-visual files were created by pairing each of the auditory stimuli with the visual articulations of /ba/ and /ga/. Procedure The children were assessed individually in a quiet room either within their home or at school. A Toshiba 47CDT notebook was used to run the activities with the response options and video files presented on a 15 ELO Intellitouch touch sensitive monitor. The child sat facing the monitor throughout testing. A self-amplified Wharfedale speaker was placed on a tripod one metre in front of the hearing impaired child s implanted ear, and on either side for children with normal hearing. The speaker had a broad frequency response from Hz to 15kHz. The speaker height was adjusted to each child's ear level. The sound level was calibrated to 68dBSPL at the child's ear level using a sound level meter (Quest Electronics). During the initial one-hour test session of a larger study, the child completed four activities. For each activity, the response options, ba, da, and ga were presented as buttons on the monitor. For all four tasks the child responded by selecting one of the buttons to indicate the sound they perceived. An auditory-alone practice trial was included to allow the child to become familiar with the exemplar stimuli. Following this trial was an auditory-alone test consisting of each of the nine stimuli presented ten times in random order. The children were then given a visual-alone trial to allow them practice in recognising the articulations. The final activity was the audio-visual test in which the children observed the video recordings of /ba/ and /ga/ presented in synchrony with each of the nine stimuli, each combination presented ten times. This activity was divided into two blocks of 9 audio-visual stimuli and a rest period was allowed in between. Results from the auditory-alone and the audio-visual test were recorded. Melbourne, December 2 to 5, 2. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 434
RESULTS Independent samples t-tests revealed no significant differences between the two groups for age, t (18) =.15, p =.989, or Performance IQ, t (18) =.243, p =.81. All children completed the auditory-alone (AA) and audio-visual testing (AV). Results of the AA testing are displayed in Figures 1 and 2. The children s responses to the AV visual /ga/ articulations are shown in Figures 3 and 4, and responses to visual /ba/ are shown in Figures 5 and 6. Results from the AA task indicated that the control group perceived stimuli 1 and 2 as /ba/, stimuli 4-6 as /da/ and stimuli 8 and 9 as /ga/. Children using a cochlear implant were able to consistently perceive /ba/ for stimuli 1 and 2 under the AA condition. The implanted group s perception of /da/ and /ga/ was less consistent with all responses lower than 5% for the nine stimuli. Figure 1. AA condition: Average responses made by control group. Figure 2. AA condition: Average responses made by implant group. Figure 3. AV condition with Visual /ga/: Average responses made by control group. Significant Paired Samples results in red. Figure 4. AV condition with Visual /ga/: Average responses made by implant group. Significant Paired Samples results in red. 123456789 123456789 Figure 5: AV condition with Visual /ba/: Average responses made by control group. Significant Paired Samples results in red. Significant Wilcoxon Signed Ranks results in green. Figure 6. AV condition with Visual /ba/: Average responses made by implant group. Significant Paired Samples results in red. Significant Wilcoxon Signed Ranks results in green. Melbourne, December 2 to 5, 2. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 435
Surowiecki et al. Visual biasing of auditory percepts by children To compare mean group performance for the AA and AV conditions, paired samples t-tests were used for parametric variables and Wilcoxon Signed Ranks tests were conducted on non-parametric variables. Audio-visual responses that significantly differed from the AA condition at a level of.5 or lower are marked in Figures 3 6. As seen in Figure 2, when provided with visual /ga/ articulations, the mean number of /ba/ responses made by the control group for stimuli 1 3 decreased whilst their average number of /da/ responses increased for stimuli 2 and 3 and decreased for stimulus 4. Under this visual /ga/ condition, the control group s /ga/ responses to stimulus 9 decreased. When presented with visual /ga/ articulations, children using a cochlear implant perceived fewer /ba/ stimuli when listening to stimuli 1 4 and 8, and more /da/ syllables when presented with stimulus 1. The implanted group s results are displayed in Figure 4. Presented in Figure 5, the control group s auditory perception of /ba/, /da/ and /ga/ was altered by the simultaneous presentation of visual /ba/ articulations. When presented with visual /ba/, children with normal hearing perceived /ba/ more often for stimuli 3 9, resulting in significantly fewer /da/ responses for these same stimuli. The mean number of /ga/ responses also decreased for stimuli 8 and 9. Under the same visual /ba/ condition, the implanted children s auditory perceptions of stimuli 3 9 were also altered. Children using a cochlear implant perceived /ba/ for all nine stimuli in this AV condition, with a minimum /ba/ response rate of 47%. The mean number of /da/ responses decreased for stimuli 3 9, as did /ga/ responses for stimuli 3, 4, 6, and 7. DISCUSSION As anticipated, all participants were influenced by visual speech cues as evidenced by the different responses to the nine stimuli in the audio-visual (AV) conditions. Compared to the auditory-alone (AA) condition, children in the control group perceived fewer /ba/ percepts when stimuli 1 3 were paired with visual /ga/ articulations. Despite the fact that these children consistently labelled these stimuli as /ba/ in the AA condition, and despite the limited visual cues available in a /ga/ articulation, the visual /ga/ presentation altered the children s perceptions of the stimuli. The number of /da/ percepts increased for the control group under the visual /ga/ condition, suggesting the articulations assisted the children to perceive /da/ qualities in the transition-only stimuli. Congruent visual /ga/ cues did not increase the control group s auditory perception of stimuli 4 9. Performance actually decreased under the visual /ga/ condition with fewer /da/ percepts for stimuli 4 and fewer /ga/ responses to stimuli 9. However, the overall shape of the control group s responses remained similar to the AA condition. This suggests that while the children were influenced by the visual articulations of /ga/, they were able to utilise the transition cues made available to label the stimuli as /ba/, /da/, or /ga/. Children using a cochlear implant were also influenced by visual /ga/ articulations. In particular the children responded less often that they perceived /ba/ for stimuli 1 4 under the visual /ga/ condition. The group s mean number of /da/ responses increased for stimulus 1. However, the percentage of /da/ responses for this condition remained below chance level. No difference was identified between the number of /ga/ responses made by the implanted group under the AA and AV visual /ga/ conditions. The implanted groups response rate to the /da/ and /ga/ syllables remained low across the nine stimuli with a maximum response rate of 53%. Combined, these results suggest that the presentation of congruent /ga/ articulations did not assist either group in improving their auditory perceptions of the stimuli. Under the visual /ba/ condition, the control group perceived stimulus 3 as /ba/, whereas under the AA condition they were uncertain of the sound. This suggests that when presented with an ambiguous speech sound, such as stimulus 3, children with normal hearing use congruent visual speech cues to influence their judgement of the sound. Under the visual /ba/ condition, children with normal hearing perceived fewer /da/ syllables for stimuli 3 5 and fewer /ga/ syllables for stimuli 8 and 9. Previous reports indicate that auditory perception of /da/ and /ga/ is greatly assisted by the presence of burst cues (Green & Norrix, 1997). As the present set of stimuli contained only transition cues, it is possible that the control group perceived ambiguity in the auditory signal for /da/, resulting in greater reliance on visual speech cues to supplement their auditory perception of these sounds (Massaro, 1984). However, whilst the control group responded with fewer /da/ and /ga/ responses under this AV condition, the shape of their /da/ and /ga/ responses remained comparative with auditory-alone responses. In contrast, when presented with visual /ba/, children using a cochlear implant consistently perceived the syllable /ba/ for all nine stimuli. Under the visual /ba/ condition, the implant group perceived more /ba/ qualities in stimuli 3 and 4, suggesting that the congruent visual speech cues assisted the children in their perception of these blended sounds. Their remaining responses to stimuli 5 9 suggest that Melbourne, December 2 to 5, 2. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 436
Surowiecki et al. Visual biasing of auditory percepts by children the children were strongly influenced by the visual /ba/ articulations as all /da/ and /ga/ responses reduced to below chance level. This could be attributed to the hearing-impaired group s lower responses to /da/ and /ga/ in the auditory-alone condition, indicating a difficulty to perceive the sounds as either syllable. Ambiguity in the auditory signal may result in increased reliance on the strong visual /ba/ articulations to create a percept of the stimuli (Massaro, 1984). Further investigation is warranted to determine the reason behind the children s responses. Ambiguity in the signal could be attributed to the synthetic tokens used and the limited speech features made available to the children. A pilot study with post-lingually hearing-impaired adult cochlear implant users suggested that /da/ was perceived at a response rate of 52% for stimulus 5 and /ga/ was perceived at 67% for stimuli 8 and 9. Combined, these results suggest that the speech processor may not encode and present the synthetic stimuli to the auditory nerve with sufficient features to create a /da/ and /ga/ percept. Comparison between the current sample of implanted children and language experience matched children could determine the extent to which listening experience contributed to the findings. Overall these findings suggest that strong visual articulations such as /ba/ assist children to perceive speech sounds, in particular when those sounds are ambiguous. In the current study, ambiguity was introduced by creating synthetic stimuli containing transition cues. In everyday listening situations, ambiguity is added to the auditory signal when listening to a degraded signal, when listening to speech presented in noise, or when listening monaurally. In such situations, all children, in particular those using a cochlear implant, are likely to benefit from seeing the talker s articulations. REFERENCES Arnsten, A. F. T. (1998). Catecholamine modulation of prefrontal cortical cognitive function. Trends in Cognitive Sciences, 2(11): 436-447. Beebe, H.H., Pearson, H.R., & Koch, M.E. (1984). The Helen Beebe Speech and Hearing Center. In D. Ling (Ed). Early intervention for hearing-impaired children: Oral options. College-Hill Press: San Diego, pp. 15-54. Erber, N. P. (1969). Interaction of audition and vision in the recognition of oral speech stimuli. Journal of Speech & Hearing Research, 12: 423-425. Grant, K. W., Ardell, L. H., et al. (1985). The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speech-reading in normal-hearing subjects. Journal of the Acoustical Society of America, 77: 67. Green, K. & Norrix, L. (1997). Acoustic cues to place of articulation and the McGurk effect: The role of release bursts, aspiration and formant transitions. Journal of Speech, Language, and Hearing Research, (3): 646-665. Green, K. P. & Kuhl, P. K. (1989). The role of visual information in the processing of place and manner features in speech perception. Perception & Psychophysics. 45(1): 34-42. Hockley, S. N. & Polka, L. (1994). A developmental study of audio-visual speech perception using the McGurk paradigm. Journal of the Acoustical Society of America, 96: 339. Massaro, D. W. (1984). Children's perception of visual and auditory speech. Child Development, 55: 1777-88. McGurk, H. & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264: 746-8. Pichora-Fuller, M. K. (1996). Working memory and speechreading. Speechreading by humans and machines: Models, systems, and applications. D. G. Stork, & M.E. Hennecke, Springer-Verlag: Berlin, pp. 257-274. Sumby, W. G. & Pollack, I. (1954). Visual contributions to speech intelligibility in noise. Journal of the Acoustical Society of America, 26: 212-215. Vaughan, P. (Ed). (1981). Learning to listen: A book by mothers for mothers of hearing-impaired children. Beaufort Books: Toronto. Walden, B., Montgomery, A., et al. (199). Visual biasing of normal and impaired auditory speech perception. Journal of Speech and Hearing Research, 33: 163-173. Wechsler, D. (1991). Wechsler Intelligence Scale for Children (3 rd ed). The Psychological Corporation: Texas. Melbourne, December 2 to 5, 2. Australian Speech Science & Technology Association Inc. Accepted after abstract review page 437