A dissertation presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Doctor of Philosophy. Ning Zhou.

Size: px
Start display at page:

Download "A dissertation presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Doctor of Philosophy. Ning Zhou."

Transcription

1 Lexical Tone Development, Music Perception and Speech Perception in Noise with Cochlear Implants: The Effects of Spectral Resolution and Spectral Mismatch A dissertation presented to the faculty of the College of Health and Human Services of Ohio University In partial fulfillment of the requirements for the degree Doctor of Philosophy Ning Zhou June Ning Zhou. All Rights Reserved.

2 2 This dissertation titled Lexical Tone Development, Music Perception and Speech Perception in Noise with Cochlear Implants: The Effects of Spectral Resolution and Spectral Mismatch by NING ZHOU has been approved for the School of Hearing, Speech and Language Sciences and the College of Health and Human Services by John M. Zook Professor of Biological Science Randy Leite Interim Dean, College of Health and Human Services

3 3 ABSTRACT ZHOU, NING, Ph.D., June 2010, Hearing Sciences Lexical Tone Development, Music Perception and Speech Perception in Noise with Cochlear Implants: The Effects of Spectral Resolution and Spectral Mismatch (195 pp.) Director of Dissertation: John M. Zook This work examines lexical tone development, speech, and music perception with cochlear implants, in relation to the effects of spectral resolution (i.e., number of channels) and monaural and binaural frequency-place mismatch (i.e., spectral shift and compression) in cochlear implants. The purpose of studying the effects of spectral resolution with cochlear implants was that the contemporary systems of cochlear implants provide only temporal envelope information of acoustic signals that is sufficient for speech perception in quiet but fails to provide enough spectral resolution for adequate presentation in noisy environments due to the lack of representation of the fine structure. Poor spectral resolution causes poor representation of pitch information that is critical for music appreciation and lexical tone perception in tonal languages. Chapter 2 discusses the development of novel and objective measures to the assessment of tone production in pediatric cochlear implants users. The measures included acoustic analyses of F0 distribution, cross correlation of F0 contours, and a neural network approach. Chapter 3 reports lexical tone development in a large group of Mandarin-speaking children with cochlear implants. The contributing factors to the children s tone perception and production development were separately identified in multivariable regression analyses. Cochlear implants with shallow insertion presents a unique phenomenon where the

4 4 content of the speech signal is delivered to higher frequency regions of the cochlea than the more mid-range regions commonly used in speech recognition, resulting in upward (i.e., basal) spectral shift. Chapter 4 discusses such effects of such implant related frequency-place mismatch on lexical tone perception and consonant confusion with a unilateral cochlear implant simulation using a noise-excited vocoder. Chapter 5 presents a novel speech processing strategy that uses dichotic stimulation to improve spectral resolution in bilateral implants. The dichotic stimulation assigns odd- and even-number spectral channels that carry complementary information in an interleaved manner to bilateral implants. The effects of binaural frequency-place mismatch as well as the effects of increased spectral resolution delivered via dichotic stimulation were examined for music perception, lexical tone perception, frequency discrimination, and speech perception in noise. Results of the study indicated that dichotic stimulation significantly improved speech perception in noise due to better auditory segregation. Customized frequency-to-electrode maps that minimize spectral mismatch may provide better speech perception in monaural or dichotic stimulation. Approved: John M. Zook Professor of Biological Science

5 5 ACKNOWLEDGMENTS The writing of a dissertation can be a lonely and isolating experience, yet it is obviously not possible without the personal and practical support of numerous people. Thus my sincere gratitude goes to those people who have made this dissertation possible and because of whom my graduate experience has been one that I will cherish forever. I owe my gratitude to my dissertation committee members, who offered help, understanding, and encouragement in many of the difficult times in completing this dissertation work. I would like to sincerely thank Dr. Li Xu for his guidance during my PhD studies at Ohio University. It was paramount in providing a well-rounded experience consistent with my long-term career goals. I have been amazingly fortunate to be given the freedom to explore on my own, and at the same time the guidance to recover when my steps faltered. I would also like to acknowledge my peer PhD students, my lab members, and friends, with whom I worked closely and puzzled over many of the same problems. They have always been there, supporting me unconditionally. I would like to especially thank Dr. Zhen Zhu for helping me learn programming with Matlab, and offering me many problem-solving skills. Most importantly, none of this would have been possible without the love and patience of my family. I am grateful to my father, in particular, who passed away two years ago, for his faith in me. I wish that he lived to see me complete my PhD degree.

6 6 TABLE OF CONTENTS Page Abstract... 3 Acknowledgments... 5 List of Tables... 9 List of Figures Chapter 1: An Overall View of Musical and Voice Pitch Perception with Cochlear Implants Pitch Perception with Cochlear Implants Music Perception with Cochlear Implants Acoustic Characteristics and Correlates of Lexical Tones Tone Perception and Vocoder Simulation of Cochlear Implants Methods to Improve Music and Lexical Tone Perception With Cochlear Implants Frequency-Place Mismatch Rationale and Outline of the Dissertation Chapter 2: Development of Methods to the Assessment of Tone Production In Cochlear Implant Users Introduction Acoustic Analysis Based on Tonal Ellipses Methodology Subjects and recording procedures F0 extraction and F0 onset-offset plotting Definition of the three indices Results and Discussion Acoustic Analysis of F0 Contours Methodology Standard F0 contour from normal hearing children Comparing F0 contours using cross-correlation Results and Discussion Neural Network Analysis of the F0 Contours Methodology... 50

7 Neural network and its structure Evaluation of tone production of children with cochlear implants with the neural network Results and Discussion Perceptual Study Methodology Results and Discussion Discussion and Conclusion Chapter 3: Lexical Tone Development in Children with Cochlear Implants Introduction Methods Subjects Tone Perception Test Tone Production Test Speech recording Acoustic and perceptual analysis Results Relationship between tone perception and production Predictor variables to tone perception and production performances Discussion and Conclusion Chapter 4: Effects of Frequency-Place Mismatch on Speech Perception with Cochlear Implants Introduction Frequency-Place Mismatch and Tone Perception Methods Speech material and signal processing Subjects and procedure Results Discussion and Conclusion Frequency-Place Mismatch and Consonant Confusion Methods Results

8 Effects of spectral shift and spectral resolution Confusion analysis Discussion and Conclusion Chapter 5: Dichotic Stimulation Improves Pitch Perception with Bilateral Cochlear Implants Introduction Acoustic Simulation Using Dichotic Stimulation Methods Melody recognition test Music interval discrimination test Frequency discrimination test Tone recognition test Speech recognition in noise Results Tone recognition test Discussion and Conclusion Dichotic Stimulation in Bilateral Cochlear Implant Users Methods Music interval discrimination test Frequency discrimination test Speech reception threshold test Results Discussion and Conclusion Chapter 6: Conclusion References

9 9 LIST OF TABLES Page Table 1: Demographic Information of the Cochlear Implant Subjects...31 Table 2: Chronological Ages for the Cochlear Implant and Normal Hearing Groups...66 Table 3: Corner Frequencies of the Carrier Bands for Eight Insertion Conditions...90 Table 4: Compressed Frequency Allocation for a Four-Band Processor...92 Table 5: Corner Frequencies of the Carrier Bands for Seven Mismatch Conditions Table 6: Coding of Consonants Table 7: Frequency Ranges of the Melodies Table 8: Channel Distribution Table 9: Frequency Allocation Table 10: Demographic Information of the Bilateral Implant Subjects Table 11: Display of the Activated Electrodes in Six Conditions/Maps...165

10 10 LIST OF FIGURES Page Figure 1: Spectrograms and F0 contours of the four tone patterns...16 Figure 2: Tonal ellipses...38 Figure 3: Averaged tonal ellipses size ( Ae ) as a function of age of the normal hearing children...40 Figure 4: Plots of the values of three indices...41 Figure 5: Correlation between each pair of the three indices...44 Figure 6: Contour correlation...46 Figure 7: Box plots of the contour correlation (R values)...50 Figure 8: Tone recognition error patterns...53 Figure 9: Correlation among measures...57 Figure 10: Rank-ordered tone perception scores...69 Figure 11: Confusion matrices...69 Figure 12: Tonal ellipses of four normal hearing children and four children with cochlear implants...71 Figure 13: Box plots of the indices of the acoustic analysis...72 Figure 14: Box plots of the tone production...73 Figure 15: Correlation between the recognition scores...74 Figure 16: Correlation between tone perception and production...75 Figure 17: Contributing factors for tone perception...76 Figure 18: Contributing factors for tone production...76

11 11 Figure 19: Schematic representation of the electrode location...91 Figure 20: Performance as a function of simulated insertion depth...96 Figure 21: Open versus nasal syllables...97 Figure 22: Performance as a function of frequency compression...99 Figure 23: Overall amplitude contours of all syllables Figure 24: Percent correct and information transmission scores for articulatory features Figure 25: Information transmission for particular manner and place of articulation..115 Figure 26: Confusion matrices using data pooled across channel conditions Figure 27: Correlation between the confusion matrices Figure 28: A schematic representation of dichotic distribution of channels with bilateral implants of four electrodes Figure 29: Frere Jacques Figure 30: Schematic plot of the stimulus structure with reference to time Figure 31: Familiar melody recognition scores Figure 32: Exponential fit of the interval discrimination threshold Figure 33: Interval discrimination thresholds for the normal hearing group in three different stimulation modes Figure 34: Frequency discrimination thresholds for the normal hearing group in three different stimulation modes Figure 35: Tone recognition scores Figure 36: CUNY sentence recognition scores...153

12 12 Figure 37: Music interval discrimination test scores Figure 38: Frequency discrimination test scores Figure 39: Speech reception thresholds Figure 40: Dichotic advantage...173

13 13 CHAPTER 1: AN OVERALL VIEW OF MUSICAL AND VOICE PITCH PERCEPTION WITH COCHLEAR IMPLANTS Pitch Perception with Cochlear Implants For the past 20 years, over 130,000 people with severe to profound hearing loss have experienced a partial restoration of hearing using cochlear implants (CIs), notably enhanced speech perception in quiet. Designed primarily for western languages, current CI systems deliver the temporal aspects of speech through electric pulses. It has been shown that temporal envelope information carried by a few spectral channels is sufficient for English phoneme recognition in quiet (e.g., Fishman, Shannon, & Slattery, 1997; Friesen, Shannon, Baskent, & Wang, 2001; Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995; Xu, Thompson, & Pfingst, 2005). One of the most significant problems with current CI systems, however, is the inadequate representation of pitch. Music perception and lexical tone perception both require accurate pitch extraction from complex signals, but neither place nor temporal pitch transmitted through CIs proves to be sufficient. Place pitch is associated with the place where the electrical stimulation is delivered. Typical CI systems use only 12 to 22 electrodes to transmit a wide range of frequencies. With such poor frequency selectivity, fundamental frequency (F0) and harmonics are unlikely to be resolved. The primary mechanism of temporal pitch perception with CIs relies on rapid temporal fluctuation in electric stimulations. Varying the stimulation rate of the pulse trains on single electrode results in a change in perceived pitch (e.g., Moore & Carlyon, 2005). Modern cochlear implants, however, typically do not vary stimulation rate in the speech processing strategies. The constant high-rate pulse trains are modulated with temporal envelopes of the signals. Weaker pitch percept has

14 been shown to be related to amplitude modulation frequencies up to 300 Hz (Eddington, Dobelle, Brackmann, Mladejovsky, & Parkin, 1978; McKay, Mcdermott, & Clark, 1994; Shannon, 1983; Tong & Clark, 1985; Townshend & White, 1987; Zeng, 2002). Nonetheless, temporal pitch extracted from the fluctuation in the temporal envelopes is unlikely to be reliable in electric stimulation. Factors such as electrode position and degeneration of spiral ganglion cells can affect the accuracy of temporal pitch extraction (Moore, 2003). Music Perception with Cochlear Implants Music perception with CIs is dependent greatly on resolving place pitch, which is not provided by the CI system. As a result, postlingually deafened implant users consistently showed impairment in identifying familiar songs (Fujita & Ito, 1999; Leal et al., 2003). Among the three aspects of music perception (i.e., rhythm, pitch, and timber), rhythm was the best-perceived aspect by CI users, since timing is relatively well preserved in CI stimulation. It has been shown that CI users were not able to identify melodies without distinctive rhythmic cues (Gfeler et al., 2002, 2007; Gfeller, Olszewski, Turner, Gantz, & Oleson, 2006). For example, Kong, Cruz, Jones, and Zeng (2004) demonstrated that when presented with two versions of a melody, one with the original rhythm and the other with notes of equalized duration, CI users were only able to recognize melodies with the original rhythm. It appears that the CI users have great difficulties perceiving the other two aspects of music, timber and pitch. The pitch discrimination thresholds of CI users ranged from four semitones to two octaves and are much higher than that of NH listeners (Fujita & Ito, 1999). Not surprisingly, the impaired pitch discrimination in CI users was highly 14

15 15 correlated with their impaired melody recognition (Gfeller et al., 2002). Sucher and McDermott (2007) demonstrated that detecting the direction of pitch change with CI systems was also challenging. Thus, CI users difficulty with music perception can be attributed primarily to their impaired pitch discrimination ability. Our recent study (Xu et al., 2009) objectively examined vocal singing in young children with implants and found that pitch-related aspects of their singing were significantly different from that of NH children. The implanted children s pitch range was significantly compressed. While singing, they produced random pitch contours and showed significantly greater deviation from the target compared with the NH controls. Interestingly, timing or rhythm of their singing did not differ from the normal control group. The literature seems to suggest that CI users are impaired mainly in the pitch-related aspects of music perception. Acoustic Characteristics and Correlates of Lexical Tones Lack of pitch information in current CI systems impairs lexical tone development in tone-language users. Tone languages are spoken by over one quarter of the world s population. In these languages, the meaning of a word is determined not only by its segmental structure but also by pitch variation. In Mandarin Chinese, for example, there are four tones that are commonly known as tone 1, tone 2, tone 3, and tone 4. The F0 variation patterns as well as the absolute frequency heights have named the Mandarin tones (Howie, 1976). Tone 1 is known as high flat ; and tone 2, as middle low rising. Tone 3 is considered to have a dipping and rising contour, with possible loss of voicing in the middle that corresponds to a glottal stop. Sometimes the F0 contour of tone 3 falls from a moderate to low level but then does not rise. Tone 4 is considered to have a high falling contour. Spectrograms of the four Mandarin tones using a sample syllable are

16 16 shown in Figure 1. Chao (1968) introduced a nomenclature that delineates the F0 patterns for the four Mandarin Chinese tones. The four tones are named tone 55, tone 35, tone 214, and tone 51. The numbers reflect the relative levels of the starting and ending points of the F0 contours; in the case of tone 3, the numbers reflect the starting, middle and ending points of the F0 contour. Although there are reasons to consider tone a prosodic feature given its mobility, tone is not functionally different from segmental features and, therefore, has been known as phonemic (Duanmu, 2000). tone 1 tone 2 tone 3 tone Frequency (khz) Time (msec) Figure 1. Spectrograms and F0 contours of the four tone patterns. The F0 contours extracted with the autocorrelation method are plotted with black symbols. The F0 height and contours are the primary intrinsic cues for tone perception, although the contour information may be distributed unevenly with certain parts of the pattern being better indicators of the tone than others (Liu & Samuel, 2004). In addition, when the contour of the F0 becomes ambiguous, the absolute F0 height can be of use to

17 17 tone perception (Lee, 2009). The F0 height cue can be an even more important acoustic correlate for Cantonese tones, given that 4 out of the 6 Cantonese tones are contrasted by the F0 height alone (Ciocca, Francis, Aisha, & Wong, 2002). Despite of the fact that F0 constitutes the most important acoustic characteristic of tones, there are secondary acoustic cues well documented in the literature for tone recognition, when the F0 information is compromised. They include duration, amplitude variation or temporal envelope, and spectral envelope of the speech signal (e.g., Whalen & Xu, 1992). Mandarin tones differ in duration. Tone 3 is produced with the longest duration, particularly its canonical [214] form. Durations of the other three tones reported in the literature, however, did not always agree (e.g., Howie, 1976). Fu and Zeng (2000) recorded tone production from ten speakers across six syllables and found that the average length for tone 3 was the longest (463.3 ms) and tone 4 was the shortest (334.4 ms). Tone 2 was measured modestly long (374.7 ms) and tone 1 relatively short (339.5 ms). Liu and Samuel (2004) also reported that that tone 3 responses using whispered speech that contains no voice source were strongly correlated with longer stimuli, whereas tone 1 and tone 4 responses were correlated with shorter stimuli. Furthermore, Xu, Tsai, and Pfingst (2002) showed in a maximum likelihood model that tone recognition could be as high as 56.5 % correct based on duration cue alone. There are a handful of studies that examined the use of temporal envelope cues. Acoustic signals are comprised of a temporal envelope that is a slowly varying amplitude contour and a temporal fine structure that is the fast fluctuating component of the signal.

18 18 As early as 1988, Lin demonstrated with fine structure information that the temporal envelope of a broad-band speech signal did not contribute much to tone perception. The dependence of tone perception on fine structure was more evidently demonstrated by Xu and Pfingst (2003). They used an auditory chimera signal processing technique (Smith, Delgutte, & Oxeham, 2002) to create chimeric signals that have the temporal envelope of one tone and the fine structure of another tone. When presented with the conflicting acoustic information, NH listeners responded 90% of the time based on the fine structure of the chimeric stimuli. When the fine structure was removed, tone recognition using signal-correlated noise with controlled duration remained above chance (Whalen & Xu, 1992). Whalen and Xu attributed the unexpected performance to the energy distribution in the amplitude contour of the tones, by presenting a significant correlation of the amplitude contour to F0 variation. Fu and Zeng (2000) further explored the roles of amplitude contour, duration, and periodicity in tone perception. The three cues were manipulated to be paired, to be isolated, or to be combined all together. Tone perception was the best, around 70%, when all three temporal cues were available. Either amplitude contour or periodicity cue resulted in approximately 55% correct recognition, and duration cue alone provided the lowest recognition score of about 35% correct. Kong and Zeng (2006) pointed out that the duration and overall amplitude cue alone cannot account for the significant recognition performance of whispered tone (i.e., 60-70% correct). They argued that a 1-band signal that contained 50 Hz temporal envelope information was far more poorly recognized than the whispered tone tokens and the reduction in performance was even more apparent in noise conditions. They reasoned

19 that the formant frequencies represented in the spectral envelope can be used by listeners to match the voice pitch of the speaker. Kuo, Rose, and Faulkner (2008) examined the contributions of both the primary F0 cue and secondary temporal cues to tone perception. Place or temporal pitch was conveyed either by a F0-controlled sawtooth carrier or a sinusoid modulated noise carrier. The amplitude contour of the signal was extracted to modulate the F0 controlled or random carriers. Tone perception was found to be the most robust with explicit F0 information, compared to any combination of the other acoustic cues. The discussion about secondary cues for tone perception becomes particularly relevant when it comes to pitch perception with cochlear implants. The vocoder (i.e., voice coder) centric speech processing strategies of cochlear implants virtually eliminate the fine structures of speech signals but faithfully represent the time-related aspects of the signals. The secondary cues for tone perception, which are primarily temporal, have become more important in electric hearing. Tone Perception and Vocoder Simulation of Cochlear Implants Tone perception has been examined in many studies that used a vocoder to simulate multichannel CIs (e.g., Fu, Zeng, Shannon, & Soli, 1998; Xu et al., 2002; Kong & Zeng, 2006). In vocoder processing, speech signals are divided into spectral bands. The temporal envelopes from each of the bands were extracted to modulate wide-band noise spectrally limited by the same bandpass filter. The amplitude modulated narrowband noises from each channel is summed to form a reconstructed signal (see Xu & Pfingst, 2008). In CI simulations using vocoders, the temporal fine structure in the speech signal is therefore replaced by noise in the bands. The band-specific temporal envelope 19

20 20 cue is well represented. It is possible to control spectral resolution of the output signal by varying the number of spectral channels. The amount of temporal envelope details is typically controlled by varying the cutoff frequency of the lowpass filters that are used to extract the envelopes. The relative contribution of spectral and temporal information to tone perception has been examined in studies using noise-excited vocoders. Fu et al. (2002) showed that an increase of spectral channels from 1 to 4 did not improve Mandarin tone recognition. Tone recognition performance started to improve as spectral channels were increased beyond 4 (Xu et al., 2002). As many as 32 spectral channels were needed to achieve performance close to that of the un-processed tone stimuli (Kong & Zeng, 2006). Fu et al. (2002) also found that in quiet, tone recognition performance in the 8-band 50 Hz lowpass cutoff condition was worse than that in the 1- band 500 Hz lowpass cutoff condition, but this pattern was reversed in noise conditions. This indicates that coarse spectral information (e.g., 8 bands) may not be very useful for tone perception in quiet, but is of great importance for perception in noise, because the temporal envelope cues might be more susceptible to noise compared to the spectral cues. An interesting interaction between the spectral and temporal cues, later known as the trade-off relation between the two cues was reported by Xu et al. (2002). Tone performance with higher spectral resolution (i.e., more spectral channels) but less detailed temporal envelope was equivalent to that with low spectral resolution but more detailed temporal envelope. Xu et al. (2002) indicated that the reduced information in the spectral domain can be compensated for by the increased information in the time domain for tone perception, and vice versa.

21 Taken together, the literature seems to suggest that the temporal or spectral fine structure of the signal is the most dominant cue for tone perception (Kong & Zeng, 2006; Xu & Pfingst, 2003). In the absence of explicit F0, such as in CI stimulation or its vocoder simulation, the temporal information (including the temporal envelopes in local spectral channels, overall amplitude contour, and duration) contributes to tone perception (e.g., Whalen & Xu, 1992). Methods to Improve Music and Lexical Tone Perception With Cochlear Implants Much effort has been made to enhance pitch coding in CI systems in order to improve music and lexical tone perception. Quite a few studies have attempted to explicitly code F0 in the temporal envelopes in CI stimulation to better represent temporal pitch (e.g., Geurts & Wouters, 2001; Green, Faulkner, & Rosen, 2004; Hamilton, Green, & Faulkner, 2007; Laneau, Wouters, & Moonen, 2006; Luo & Fu, 2004; Vandali et al., 2005). Methods that have been used include: explicitly modulating noise or pulse train carriers with an additional component that follows F0; using a single channel dedicated to transmitting just the F0; and modifying envelope shapes to make them better resemble the F0 contours. These efforts yielded only modest success. After all, human ears can only extract temporal pitch up to 300 Hz (see Xu & Pfingst, 2008 for a review). Place pitch is elicited according to physical location of the electrodes and provides the dominant cue, especially for music perception (McDermott & McKay, 1997). The number of spectral channels sufficient for detecting relative pitch movement, such as in the perception of lexical tones, is roughly 30 (Zong & Zeng, 2006). Good music perception takes as many as spectral channels, which is beyond what is 21

22 22 available in the current CI system (Smith et al., 2002). Simply adding electrodes to the implant will not necessarily increase the number of functioning spectral channels, because electric stimulation often creates broad current spread resulting in channel interaction. A few studies have investigated the possibility of enhancing place pitch in CI systems. Geurts and Wouters (2004) designed a new filtering strategy, in which the first harmonic of a complex signal is always resolved in two adjacent filters. Pitch discrimination thresholds improved significantly with the new filter design compared to the traditional filters used in clinical processors. A more direct approach, the new speech processing strategy HiRes 120 developed by Advanced Bionics, has sought to enhance place pitch with virtual channels. The virtual channels are created by adjusting the ratio of electric power exerted to adjacent electrode pairs. The adjustment can be made in eight steps for each adjacent electrode pairs to elicit excitation patterns centered at eight different places between the electrode pair. Therefore, 120 possible virtual channels are made available in the Hires 120 strategy. The new strategy was used in 20 children with CIs who speak Mandarin Chinese (Han et al., 2009). Tone perception performance was measured longitudinally for 6 months. While the results showed wide individual variability in performance, some children did show a trend of improvement in tone perception after their speech processing strategy was converted to HiRes 120 for 6 months. The utility of the virtual channels, however, may depend on many other factors including neural survival patterns and the unequal distances between the electrodes and excitation targets.

23 Frequency-Place Mismatch In normal hearing, frequency components of an acoustic signal excite particular places in the cochlea in a tonotopic fashion. In electric hearing with CIs, ideally, speech signals should be delivered to excite the appropriate places in the cochlea that match the frequency content of the acoustic signal. However, as a result of shallow insertion of the electrode array, unequal electrode-to-neuron distances, or compression of frequency maps, a number of frequency-place mismatch situations, including spectral shift, warping or compression, may presumably occur in cochlear implant stimulations. One type of frequency-place mismatch takes place when the implanted ear has localized losses of auditory neurons, which results in holes in hearing. In this case, elevated electrical thresholds of the corresponding electrodes will be needed for those bands of information to be received. The increased signal level will likely result in spread of electric current to neural fibers that are not intended to be activated, producing frequency warping around the holes in the cochlea. Another type of frequency-place mismatch involves an overall shift of spectrum as a result of shallow insertion of a CI. Consider the case where the implant electrode array is not fully inserted into the cochlea so that the location of the electrode array does not match the analysis bands. Typically, the output of a low frequency analysis band will be delivered to the electrode that rests at a higher frequency place, resulting in a basal shift of the spectrum. Matching the analysis bands to the location of the electrode array nonetheless eliminates frequency coverage especially in the low frequency region. In addition, due to the limited length of the CI electrode array, the frequency range stimulated by a CI is unable to cover the entire speech spectrum. As a consequence, more commonly encountered cases of frequency-place mismatch involve 23

24 24 frequency compression. Clinically-used maps usually compressively allocate a wider frequency range sufficient for speech understanding to electrodes that cover a narrower cochlear location, regardless of the position of the electrode array. Rationale and Outline of the Dissertation The dissertation consists of five studies, covering a variety of topics including acoustic analysis of lexical tone production in child CI users (Chapter 2), regression analysis of individual variability in tone development of child CI users (Chapter 3), the effects of frequency-place mismatch on lexical tone perception (Chapter 4.1) and consonant confusion (Chapter 4.2), and a novel stimulation method for improving spectral resolution via dichotic stimulation and its efficacy in pitch perception and speech perception noise. A few studies that have examined tone perception and production in prelingually deafened children found that the degree of deficits in tone perception varied greatly from individual to individual (e.g., Ciocca et al., 2002; Han et al., 2007; Lee, Van Hasselt, Chin, & Cheung, 2002; Peng, Tomblin, Cheung, Lin, & Wang, 2004; Wei et al., 2000; Wong & Wong, 2004; Xu et al., 2004). However, the methods used to evaluate tone development in children with CIs who speak tone languages were limited and largely subjective. Chapter 2 discusses the development of a few objective measures including the acoustic analyses of the d of F0 distribution, correlational analyses of the F0 contour, and a neural network approach. Individual variability has always been observed in studies that examined speech outcome in CI users. Although there have been a handful of studies that examined tone perception and production in tone-language speaker with CIs, the studies were limited to small samples; therefore the variables accountable for

25 25 performance variation could not be examined. Chapter 3 discusses tone perception and production development in a large group of Mandarin-speaking children with CIs (i.e., N = 110). Regression analyses were performed to identify the most important contributing factors to the tone perception and production performance in CI users. Chapter 4 discusses the effect of frequency-place mismatch on Mandarin tone perception and consonant confusion using information transmission analysis. Basal upward spectral shift and frequency compression were simulated in a noise-excited vocoder. The effects of such spectral distortion were examined in lexical tone recognition and perception of place of articulation features of consonants. Chapter 5 introduces a novel speech processing strategy that uses dichotic stimulation in bilateral implants. This new strategy was designed to improve spectral resolution in CI systems, so as to improve pitch perception and speech perception in noise with CIs. The effects of possible binaural frequency-place mismatch were discussed in terms of how the mechanisms of music and tone perception are different from that of speech perception.

26 CHAPTER 2: DEVELOPMENT OF METHODS TO THE ASSESSMENT OF TONE PRODUCTION IN CI USERS Introduction It is believed that Mandarin Chinese tone is acquired by normal developing children at a very young age (e.g., Li & Thompson, 1977), probably even earlier in development than the mastery of segmentals (i.e., vowels and consonants). Research has shown that children with CIs who speak tonal languages have deficits in tone perception (e.g., Wong & Wong, 2004) due to the lack of F0 information in the electrical stimulation in current CI technology (see Moore [2003] for a review). The extent of deficits found in tone perception varied tremendously from individual to individual. With the limited tone exposures that current CIs provide, prelingually deafened children may also experience difficulties in tone production due to the lack of acoustic feedback. Studies have shown that a majority of prelingually deafened children with CIs do not master Mandarin Chinese tones. Literature on tone perception and production outcome in tone-languagespeaking children with CIs will be discussed in detail in the following chapter. The main purpose of this part of the study was twofold. One was to develop methods to assess Mandarin Chinese tone production in children with CIs. Methods used for assessing tone production in tone-language-speaking patients are limited. The other purpose was to validate these methods and evaluate their relative utility in the assessment of tone production in children with CIs. The current methods used for evaluating Mandarin tone production involve mainly acoustic analyses and perceptual judgments from adult native speakers. Methods based on acoustic analyses for evaluating Mandarin tone production of either NH or CI 26

27 27 users are still limited. Xu et al. (2004), in their preliminary reports of Mandarin tone production in children with CIs, described the CI children s Mandarin Chinese tone production by presenting the F0 contour of each tone produced by individual children. Wang, Jongman and Sereno (2003) studied non-native speakers tone production by comparing their F0 contours to an averaged native form. They normalized the duration and height of F0 contours of the speakers to fit their F0 range in levels arranging from 1 to 5 corresponding to the 5-point pitch scale for Mandarin tone proposed by Chao (1968). Five evenly spaced F0 values were chosen to compare with the native forms. Analysis of variance was performed to evaluate whether the mean deviation between the native and non-native production differed across positions on an F0 contour. Although the pitch contours were examined, neither study quantified the acoustic properties of the tone production. In contrast, a more sophisticated method of acoustic analysis was developed by Barry and Blamey (2004) for Cantonese tones. This approach studied the tonal ellipses generated over the scatter plots of the onset versus offset values of the F0 contours. The spread and degree of overlap between tonal ellipses were quantified to indicate the degree of differentiation among Cantonese tones. A more commonly used method to assess Mandarin tone production involved perceptual judgments, which are widely used to evaluate speech outcome for normalhearing (NH) subjects, and especially for CI users. We used a four-alternative forcedchoice perceptual test (Han et al., 2007; Xu et al., 2004) in which native adult speakers of Mandarin were presented with audio recording of tone production of the subjects and were required to identify the tones they heard. Methods using auditory judgments

28 28 involved measuring the correctness of tone production on equal interval scales (e.g., Peng et al., 2004). The problem with the perceptual studies is the limited information they offer about the property of tone production besides percentage scores. In summary, existing methods for evaluating tone production in Mandarin speaking implant users are limited and are largely restricted to judgments of perceptual intelligibility. We aimed at developing additional methods for evaluating tone production of Mandarin-speaking CI users and to assess the efficacy of these methods. To begin, we followed Barry and Blamey s approach (2004) that was used for assessing Cantonese tone production by adjusting and expanding that approach to accommodate its use for Mandarin Chinese. The two indices of differentiation first developed by Barry and Blamey were modified based on the differences of the two languages, i.e., the number of tones as well as their acoustic properties (see Ciocca et al. [2002] for a review). In addition, a new index that allows for the examination of the degree of differentiation of an individual tone from the others was proposed. While examination of the tonal ellipses can reflect the degree of differentiation among tones, it does not provide a complete description of the Mandarin Chinese tone contours. The approach used by Wang et al. (2003), which attempted to examine the tone contours of non-native speakers by calculating their deviation from the native forms, also involved only 5 positions on the contours. We proposed a cross-correlation method that uses a common technique for random signal processing to study the correlation of two contours. It makes possible a direct comparison of F0 contours produced by each child

29 29 with a CI with the normative contours and yields a correlation coefficient to indicate the degree of similarity of the contours. In addition to the acoustic analysis, we also developed an artificial neural network to classify the tones produced by CI users. Training of the neural network enables it to learn by adjusting its parameters and to generalize the acquired knowledge to new situations like humans do. A neural network bases its working mechanism on a defined mathematical algorithm and classifies tones into categories much like how perception works. Several studies have used artificial neural networks to classify Mandarin or Cantonese tones (Chang, Sun, & Chen, 1990; Lan, Nie, Gao, & Zeng, 2004; Lee, Ching, Chan, Cheng, & Mak, 1995; Wang & Chen, 1994; Xu et al., 2006, 2007; Zhou, Zhang, Lee, & Xu, 2008). In our recent study, a feedforward multilayer perceptron has been used to recognize the tones produced by a group of Mandarin-speaking children with NH (Xu et al., 2007). In that study, it was found that, in terms of recognition accuracy, performance of the multilayer perceptron was comparable to human listeners and was even superior to human ears when the classification task involved a large number of speakers and tokens (Xu et al., 2007). The rationale for using a neural network in addition to the acoustic analysis in the present study was that the neural network could not only produce human perception-like performance, but also operates automatically and economically. To examine the relative utility of the proposed acoustic analysis and neuralnetwork approaches, a traditional tone perception task was carried out using native- Mandarin-speaking listeners. Furthermore, a correlational analysis was conducted for

30 30 each pair of the measures including the perceptual results. Information that these measures provide and their strength and limitations were further discussed. Acoustic Analysis Based on Tonal Ellipses Methodology Subjects and recording procedures. Two groups of native Mandarin-Chinese-speaking children were recruited. One group consisted of 14 prelingually deafened children who had received CIs from the Cochlear Implant Center of Beijing Tongren Hospital, China. There were no specific criteria for recruitment in terms of age, device experience and type, or other contributing factors of tone production. Our sampling provided crosssectional data on tone development. Chronological ages ranged from 2.9 to 8.3 years old (5.2 ± 1.8, mean ± SD). Five of the 14 children used hearing aids before implantation. The implants that they had received included Clarion CII (3), N24M (7), and N24R (4). The duration of CI use ranged from 0.3 to 2.6 years (1.7 ± 0.8, mean ± SD). All of the children had received rehabilitation at professional rehabilitation centers in Beijing after implantation. The amount of rehabilitation varied among the 14 children. Their demographic information is summarized in Table 1.

31 31 Table 1 Demographic Information for the Cochlear Implant Subjects Subject Number Gender Age at Implantation (years) Chronological Age (years) Duration of Implant Use (years) CI Device 1 F N24M 2 M Clarion CII 3 M N24M 4 F N24M 5 M N24M 6 M N24M 7 M Clarion CII 8 M N24R 9 F Clarion CII 10 F N24M 11 F N24R 12 F N24R 13 M N24M 14 M N24R

32 32 The control group consisted of 61 NH children from kindergartens and elementary schools in Beijing, China. The ages of the NH children ranged from 3.1 to 9.0 years old (6.2 ± 1.7, mean ± SD). All the 61 NH children had a pure tone average (at 500, 1000, and 2000 Hz) 20 db HL. To collect their tone production samples, both groups of children were instructed to produce tone 1 through 4 for the following monosyllables: ai, bao, bi, can, chi, du, duo, fa, fu, ge, hu, ji, jie, ke, la, ma, na, pao, pi, qi, qie, shi, tu, tuo, wan, wen, wu, xian, xu, ya, yan, yang, yao, yi, ying, you, yu, yuan, zan, zhi. The above syllables were chosen because when associated with the four tones, they all produce real words in Mandarin Chinese. The children heard each of the syllables in its high flat tone, and were asked to produce the syllables in four tones. The children s familiarity with the words was not a concern because language education in kindergartens and elementary schools in Beijing greatly emphasizes tone drills. The tone drills require producing four tones for any syllables even though the children might not know the words. The elicited production of the monosyllabic words was digitally recorded at a sampling rate of 44.1 khz with a 16- bit resolution in quiet rooms with ambient noise typically round 40 db SPL. In total, 9760 tone tokens were obtained (40 syllables 4 tones 61 speakers) from the NH control group, and 2240 tone tokens were obtained (40 syllables 4 tones 14 speakers) from the CI group. The mean and SD of the durations of all the 9760 and 2240 tokens from the NH and CI groups were ms ± ms and ms ± ms, which were significantly different [t (73) = 10.75, p < 0.001]. It was evident that these typically-developing children sometimes produced inaccurate tones. One legitimate

33 33 approach would be to exclude those inaccurate tone tokens from the normative data. However, we regarded the errors as part of the developmental process in typically developing children. Therefore, all the tone tokens from the NH children were included as normative data in the analysis. F0 extraction and F0 onset-offset plotting. The F0 contours of the vowel part of the monosyllabic words were extracted using an autocorrelation algorithm. Although there are other cues that contribute to Mandarin tone perception, the focus of this research was F0, since the major problem with tone development in CI users has been the unresolved F0. The update rate of the F0 extraction was 8 ms with a frame size of 24 ms. The lower and upper boundaries for the extraction were 50 and 500 Hz. The extracted F0 contours were plotted onto and compared with the narrowband spectrograms (window size = ms) generated from MATLAB (MathWorks, Natick, MA) to verify the accuracy of F0 extraction. The extraction might sometimes include a small part of the preceding voiced consonant, but the voicing of the consonant usually does not produce reliable F0 (Duanmu, 2000). The unreliable F0 data points from the voiced consonant could be easily identified and manually deleted based on the narrowband spectrograms. Occasional errors of halving or doubling were also manually corrected with reference to the narrowband spectrograms. The offset values of the F0 contours were plotted against the onset values for each tone token. To summarize the distribution of the onset-offset plots for each child, tonal ellipses around each tone type were plotted in a manner similar to that introduced by Barry and Blamey (2004). First, the inclination of an ellipse (i.e., the direction of the

34 34 semi-major axis of the ellipse) was determined by the positive angle of the linear fit of the data points of a particular tone. The ellipse center was located using the mean of the F0 onset values along the fitting line. A perpendicular line to the fitting line defined the direction of the semi-minor axis. The lengths of the semi-major and the semi-minor axes were determined by two standard deviations of all data points away from the center along the linear fitting line and the perpendicular line, respectively. Thus, each tonal ellipse encompassed 95% of the onset-offset data points of a tone category. The F0 onset and offset values capture the extreme level of pitch that a speaker produces in most cases except for the dipping in tone 3. The plotting of the onset-offset pairs is therefore considered to reflect the tonal space of that speaker (Barry & Blamey, 2004). The plot of the onset and offset values was defined by Barry and Blamey (2004) as the tonal space of a Cantonese speaker. Although the onset and offset values do not capture the entire F0 range of Mandarin Chinese tones, for convenience we used tonal space to describe such a plot for Mandarin Chinese tones. Definition of the three indices. The four tonal ellipses that outlined the F0 onsetoffset data points were used for studying the differentiability of the tones. Following Barry and Blamey (2004), Index 1 was defined by the ratio of two areas, i.e., At over Ae. At was the area of the triangle formed by joining the centers of the three most differentiated tonal ellipses in Cantonese, which was referred to as the tonal area. Ae was the averaged area of these three ellipses. Based on the lengths of the semi-minor (a) and semi-major (b) axes of the ellipse, an ellipse area was derived using a b π.

35 35 The reason that only three tonal ellipses were chosen for analysis was that Cantonese has six tones and their tonal ellipses are relatively crowded in the tonal space. In contrast the tonal ellipses for Mandarin Chinese tones are more separable in that each of the four tonal ellipses is located in one of the four quadrants of the tonal space. Specifically, the flat tone 1 with an F0 onset-offset pair that is high and similar in level, is located in the upper right quadrant; the rising tone 2 with a low onset and high offset is in the upper left quadrant; the low dipping tone 3 with both median onset and offset is in the lower left quadrant; and the high falling tone 4 with the high onset and low offset is in the lower right quadrant (see Figure 1, top left panel). Thus, connecting the centers of the four tonal ellipses, which forms a quadrangle, measures the span of the F0 range for Mandarin Chinese tones. Barry and Blamey s Index 1 was then modified for Mandarin Chinese using: Aq Index 1 =, (1) Ae where Aq is the area of the quadrangle and Ae is the average area of the four ellipses. Index 2, also proposed by Barry and Blamey (2004) was the ratio of the average of the lengths of the two axes for the six Cantonese tonal ellipses and the averaged distance of the centers of the six tonal ellipses from each other. In the present study, this index was adopted to calculate the corresponding parameters for the four Mandarin Chinese tones. Ave Dist. Index 2 =. (2) Ave Ax 1+ 2

36 36 Because Index 1 and Index 2 used the parameters of the four tonal ellipses to measure the overall differentiability of the four tones, these indices were not capable of reflecting the separability of a particular tone from the other three. A new index was developed to examine the false detections of any individual tone. This index enabled us to infer the degree of differentiation of that tone from the others. As a result of the overlap of tonal ellipses, a given tonal ellipse might enclose not only data points from that tone category but also data points from the other tone categories. Data points from the target tone were considered hits, while the data points from all the other three tones were categorized as false detections. To determine whether a data point was enclosed by a given tonal ellipse, the position of the data point relative to the ellipse was evaluated 2 2 x y using the ellipse function (i.e., + 1= 0, where a and b are the semi-major and 2 b 2 a semi-minor of the ellipse, and x and y are the coordinates of a data point). The coordinates of the data point were first adjusted using the center the ellipse as the origin 2 2 x y of the axes. If + 1> 0, then the data point was outside the ellipse. If 2 b 2 a x a b y 2 2 1< 0, then the data point was counted to be within the ellipse. For all the data points that were in a particular ellipse, they were further categorized as hits if they were identified to be from the target tone, or false detections if they were from other tones. The false detection probability (P fd ) of the target tone was defined as the probability of other tones being falsely recognized as the target tone. It was calculated as the number of false

37 37 detections over the total number of data points from the other three tones. Index 3 was thus defined as: Index 3 fd = 1 ( Ave P ) (3) Results and Discussion Tonal ellipses were generated for each tone for individual children. The tonal ellipses of one typical NH child (NH15, 6.2 yrs old) and the 14 children with CIs are shown in Figure 2. The tonal ellipses of the NH child demonstrated an easily separable pattern, with four tonal ellipses being located in the four quadrants of the child s tonal space. The ellipses of tone 1 and tone 2 were completely separable from those of tone 3 and tone 4. Although the ellipses of certain tone pairs overlapped, they were still spatially separable. The data points for tone 1 and tone 2 were relatively densely clustered and restricted to a location in the tonal space. As a result, the sizes of these tonal ellipses were small. The small size of the ellipses suggests small variances in these data points, which in turn indicates that the child had consistent use of F0 range for tone 1 and tone 2. Such an easily separable pattern of ellipses was compromised to various degrees in the ellipse plots for the children with CIs. For some children with CIs, their data points scattered from the center of the ellipses, leading to larger sizes of the tonal ellipses. This suggests that they lacked a consistent use of F0 range to produce a certain tone. For those children with CIs whose tonal ellipses were comparatively small, the centers of their ellipses, however, shifted toward each other from the locations observed in the NH children. Consequently, tonal ellipses largely overlapped with each other.

38 38 Figure 2. Tonal ellipses. The data from a representative NH child is shown in the upper left panel. The remaining panels are organized in the order of tone production accuracy scores provided by human listeners for each CI child. The tone production accuracy score (percent correct) is shown in the upper right corner of each panel. Each data point represents a pair of F0 onset-offset value of a monosyllabic word. Different symbols represent different tones as indicated by the figure legend in the top left panel. The loss of contrast among tones in their F0 onset and offset characteristics could lead to the overlapping ellipses, thus a decreased degree of differentiation.

39 39 Two components of Index 1 quantify the size and separation of the tonal ellipses. Recall that Ae is the averaged area of the four tonal ellipses. Ae for the NH children did not differ from that of the children with CIs (Wilcoxon rank sum test, z = 1.54, p = ), which suggests that the two groups had comparable variability in F0 use for individual tones. Interestingly, Ae of the NH children showed a nonlinear decrease with their ages (see Figure 3), a trend of developing toward a smaller variation in F0 use for individual tones. The development appeared to saturate at around 6 years of age. Data analysis revealed that it was mainly the ellipses of tone 3 and tone 4 that decreased in size with age. It is worth noting that Barry and Blamey (2004) found the tonal ellipse size of their NH children was even larger than that of the CI group. This discrepancy might be due to a younger age of their NH children group (all 6 years old). The majority of their group might still be in the process of developing tone normalization skills (Barry & Blamey, 2004). There was no correlation found between ellipse size and chronological age or duration of implant use for the CI group (p > 0.05). It is possible that the children with CIs may need more device experience to develop tone production skills. Although the F0 use of individual tones used by the NH children did not show a more confined pattern than the children with CIs, Aq (i.e., the tonal area that connects the centers of the four ellipses) for the NH children was in average three times larger than that of the CI group and this difference was statistically significant (Wilcoxon rank sum test, z = 4.11, p < ). The broader tonal area of the NH children compensated for their diffuse tonal ellipses such that the tonal ellipses were still differentiated from each other. In contrast, the equally diffuse tonal ellipses coupled with the small tonal area of the children with

40 CIs resulted in their poorly differentiable tones. The shrunken tonal area of the children with CIs was also observed in Barry and Blamey s study of native Cantonese speakers. 40 Figure 3. Averaged tonal ellipses size ( Ae ) as a function of age of the normal hearing children. An exponential fitting is plotted with the solid line. Each symbol represents one normal hearing child. For all the three indices, the greater the value, the more differentiated the tones were. Values of the three indices obtained from the 14 children with CIs were compared with those of the 61 NH children (see Figure 4). The NH data were grouped together to be compared with those of the CI group, because although there was age differences in pitch range use (i.e., Ae ) by the NH children, their Aq and scores of Index 1 were not found to correlate with age (p > 0.05). Besides, the number of children with CIs was too small to be further divided into age or hearing age subgroups. The results of the

41 41 nonparametric two-sample tests showed that the two groups differed on all three index comparisons (Wilcoxon rank sum test, Index 1: z = -4.78, p < ; Index 2: z = -4.96, p < ; Index 3: z = -4.98, p < ). The tones produced by the CI group were significantly less differentiable than those produced by the NH group as indicated by the index comparisons. Figure 4. Box plots of the values of three indices. Each box depicts the lower quartile, median, and upper quartile. The outliers are data points that fall more than 1.5 boxlengths (Q3-Q1) away from 25 th or 75 th percentile. The whiskers show the range of the rest of the data. The mean value of the probabilities of false detection (i.e., P fd ) for each tone category was obtained for both groups of children. For the CI group, the probabilities of other tones being falsely recognized as tone 1 through tone 4 were 0.644, 0.711, 0.767, and 0.725, respectively. Although all tones were quite likely to be mistaken as other tones, tone 1 had the lowest probability of false detection. The corresponding averaged values of P fd for tone 1 through tone 4 for the NH children were 0.228, 0.249, 0.406, and

42 , respectively. Results of t test confirmed that P fd for the NH children was significantly lower than that for the children with CIs on all tone comparisons [t 1 (73) = , t 2 (73) = -7.89, t 3 (73) = -3.83, t 4 (73) = -3.82, all p < 0.001]. Indices 1 and 2 from Barry and Blamey (2004) were modified to measure the degree of differentiation for Mandarin Chinese tones. One limitation of Index 1 for Cantonese was that the overall differentiability of the six tones was evaluated based on the measures for only three of them. Our Index 1 evaluated all four tones in Mandarin Chinese to obtain their overall differentiation, thus the evaluation might be more accurate than what Index 1 for Cantonese provides. Another limitation of Index 1 for Cantonese as pointed out by Barry and Blamey was that the centers of the three tonal ellipses that defined the triangle could fall onto a line, which resulted in a zero tonal area. With four tonal ellipses, Index 1 for Mandarin Chinese used a quadrangle to represent the tonal area, thus considerably reducing the probability of having a zero area. From the point of view of Signal Detection Theory (Green & Swets, 1966), Indices 1 and 2 are essentially analogous to the discrimination index d' (Barry & Blamey, 2004). The d' describes how differentiable a signal is from noise by evaluating the separation between the signal-plus-noise distribution and the noise-alone distribution (Green & Swets, 1966). The d' is defined by dividing the difference between means of the two distributions by their variances. For Index 1, the tonal area, (i.e., Aq) represents the separation among the distributions, whereas the averaged area of the ellipses (i.e., Ae ) reflects the pooled variance of the distributions. For Index 2, the average distance between any two ellipse centers (i.e., Ave. Dist.) likewise represents the separation

43 43 between the distributions and the averaged lengths of the semi-minor and semi-major axes (i.e., Ave Dist.) reflect the pooled variance of the distributions. If only two distributions were examined instead of four (i.e., four tonal ellipses), or only one instead of two variables (i.e., onset and offset) were studied, the two indices would be equivalent to the d' described in the classic Signal Detection Theory (Green & Swets, 1966). In essence, Indices 1 and 2 measured the overall ability to separate the four tones based on the two variables. Index 3 provided a direct description of the overlap of tonal ellipses in terms of an averaged false detection probability of four tones. The main strength of this measure is that such description (P fd ) can be made for each individual tone, as opposed to an overall description of the differentiation of tones as provided by Indices 1 and 2. Furthermore, the ellipse size used for determining the counts of false detections could be optimized according to a desired level of detection and false alarm probabilities. Such optimization can be performed separately for individual tones, thus allowing greater flexibility in adjusting the threshold of detection. In order to evaluate the relationship among Index 3 and the modified Indices 1 and 2, the correlation coefficients between each pair of the indices were computed. Figure 5 shows the scatter plots of pairs of indices obtained from the 14 children with CIs. Each pair was highly correlated, and the correlations were all significant after controlling for family-wise type I error using the Bonferroni correction [r 12 (12) = 0.94, r 13 (12) = 0.95, r 23 (12) = 0.98, p < ]. The high correlation between each pair of the

44 indices indicates that they are consistent with each other in evaluating tone differentiation of the CI children s production. 44 Figure 5. Correlation between each pair of the three indices. Each data point represents one CI child. Acoustic Analysis of F0 Contours Methodology Standard F0 contour from normal hearing children. The F0 contours of the Cantonese tones are either flat, rising, or falling. The use of the onset-offset pair for analyzing Cantonese tones is justified because the two endpoints can catch most, if not all, of the F0 contour information of Cantonese tones such as direction and level. As shown in the previous section, the use of the onset-offset endpoints is also effective in differentiating Mandarin Chinese tones because the tonal ellipses have restricted locations in the tonal space. However, this method does not characterizethe F0 contours

45 45 for Mandarin Chinese tones that have complex contours such as tone 3. That is because these tones do not have simple rising or falling patterns, the direction and slope of which cannot be characterized by the two endpoints of the contours. This section explores a way to characterize a complex F0 contour by comparing it to a normative standard. The approach is based on cross-correlation, a measure of similarity between two signals. The normative contour for each of the four tones was obtained from the 61 NH children by averaging all the contours for each tone produced by these children. The 9760 F0 contours extracted from the 61 NH children were uneven in duration, thus different in sampling numbers. To obtain an averaged value at a given time, normalization of F0 contour duration and interpolation of F0 samples were performed. The mean duration of the contours for each tone category was calculated and the longest among the four (i.e., 334 ms) was chosen as the target duration for normalization. Then, all the tone tokens were normalized to the target duration. After normalization in duration, that is, stretching or squeezing the contours to the same length, the F0 samples on all the contours became no longer aligned on the time axis. In order to average them at any given point on the time axis, linear interpolation or re-sampling of the F0 contours was performed. The interpolation rate for a tone category was determined by the number of samples greatest in that tone category divided by the normalization duration. After normalization and interpolation, the contours with time-wise matched F0 samples were then averaged for each tone to generate four normative contours (see Figure 6, upper panel). The normative contours of the four tones were then used as the standard for comparison with the F0 contours produced by each CI child.

46 46 Figure 6. Contour correlation. The upper panel: normative contours of the four tones plotted from left to right in solid lines. The contours were averaged from the 61 NH children. The dashed lines represent one standard deviation from the mean. The middle and lower panels: Mean F0 contours of two children with CIs (CI2 and CI12) plotted onto the normative contours for tones 1 through 4. The normative contours are in dashed lines and the mean F0 contours of the children with CIs are plotted in solid lines. The R values of the contour correlation are shown for each comparison. Both duration of the tones and the F0 level of all the contours were normalized. Comparing F0 contours using cross-correlation. To compare the contours produced by the CI children with the normative ones, four averaged contours for each CI child were generated using the same methods of normalization and linear interpolation as described above. The normalization duration and interpolation rate were matched with those used for generating the normative contours. This was to ensure the same length and same number of samples for any pairs of contour comparison. Because the contours of children with CIs and the normative contours may differ in F0 levels, they

47 47 were all normalized in level by forcing the mean F0 sample values of the contours to zero. The overall similarity in the patterns of the F0 contours produced by children with CIs to the normative ones was evaluated by cross-correlation with a zero-time shift. Cross-correlation is commonly used to find features in an unknown signal by comparing it to a known one. It entails shifting the unknown signal in time and multiplying it by the known one, thus it is a function of the relative time between the two signals (Couch, 1996). In our case, however, the similarity between two F0 contours was measurable only when they were aligned in time. Therefore, we were only interested in the value of crosscorrelation function at the zero-time shift, which was referred to as contour correlation and was calculated using: N 1 R( 0) = F0T ( n) F0S ( n), (4) N n= 1 where F0 T (n) stands for the normalized F0 contour of a child, F0 S (n) stands for the normative F0 contour, and N denotes the total number of samples. Note that F0 T (n) and F0 S (n) are two zero-mean time sequences. If the two contours show opposite patterns, multiplication of the two [i.e., F0 T (n) F0 S (n)] will result in a negative R value. If a contour deviates randomly from the normative contour, the result of the multiplication [i.e., F0 T (n) F0 S (n)] will have similar properties to a zero-mean random sequence, which in turn causes the R value to approximate zero. On the other hand, if F0 T (n) and F0 S (n) show similar patterns, multiplication of the two will result in a positive R value. Note that for the above-described multiplication to work, the two sequences or contours must be normalized to have zero-mean. The F0 height

48 48 normalization procedure developed by Rose (1987, 1993) transforms F0 values to Z scores based on the overall mean F0 value of a particular speaker. Rose s method reduces between-subject variation and clusters their F0 contours to better summarize the acoustic characteristics of a given tonal language. Because this normalization method does not produce zero-mean F0 contours, it does not suit our purpose here. Results and Discussion The overall similarity of each child s contours to the normative contours was examined. Two examples of the comparisons were plotted in Figure 6 (lower panels), where the contours from two children with CIs (CI2 and CI12) were plotted over the normative contours. The contour of tone 4 of subject CI2 followed the normative contour closely and the calculation of R yielded a large positive value. In contrast, the contour of tone 3 of the child and the normative contour had a quite different pattern, thus the R value became negative. The contours of all tones produced by subject CI12 were essentially flat. Consequently, the R values for all four tones approximated zero. Contour correlation was very sensitive to comparing two F0 contours that had noticeable F0 change with time. These include contours of tone 2, 3 and 4. R value can clearly indicate the similarity in the patterns of these tones. An example can be seen from the comparison of tone 4 for subject CI2 (see Figure 6). On the other hand, this approach did not show comparable sensitivity to comparing tone 1. That is because F0 values of tone 1 fluctuate only slightly around a constant level. The similarity in patterns of tone 1 contours cannot be adequately evaluated by the contour correlation, because multiplication of two near-zero vectors produces R value approaching zero (see Figure 6).

49 49 Note though, if the contour of tone 1 from the CI child was abnormal and deviated from a flat pattern, R could still be a large negative value that indicates dissimilarity. However, this type of abnormality was not observed in our data of the 14 children with CIs. Their R values of tone 1 all approximated zero (5.2 ± 16.7, mean ± SD). Although the R values could not precisely estimate the similarities between the contours of tone 1, an averaged R value across tones could still successfully reflect the quality of tone production of a CI child. Another drawback of the measure was that to a certain degree the normative contours lose the salience in the F0 shape, especially for tone 3. It could be due simply to the averaging across so many speakers. It could also be due to the fact that the outliers in the NH subject group were not excluded from the averaging. To compare whether the contours produced by the NH children had patterns more similar to the normative contours than those of children with CIs, the contour correlation was also calculated individually for the NH children. The averaged R values of children with CIs ranged from to 388.9, with a median of The R values of the NH children ranged from 92.3 to with a median of (see Figure 7). The two distributions were easily differentiable. The R values from the NH were all positive, while those of children with CIs were spread across zero. The results of the two-sample independent t test confirmed such a difference between the two groups was statistically significant (t (73) = 5.49, p < 0.001). The significant difference suggested that the use of contour correlation was effective in detecting the differences between the two groups in terms of the overall similarity of their F0 contours to a normative standard. The R values for the CI group averaged for each tone were 5.2, 83.7, -70.9, and 394.2, respectively.

50 50 While the result of the contour correlation was not particularly indicative of the quality of production of tone 1, tone 4 was otherwise singled out as the best-produced tone by the CI children. Figure 7. Box plots of the contour correlation (R values). Each box depicts the lower quartile, median, and upper quartile. The outliers are data points that fall more than 1.5 box-lengths (Q3-Q1) away from 25 th or 75 th percentile. The whiskers show the range of the rest of the data. Neural Network Analysis of the F0 Contours Methodology Neural network and its structure. A feedforward back propagating multilayer perceptron (MLP) was implemented in MATLAB with the Neural Network Toolbox and was used to recognize the tone production of the children with CIs. The neural network had three layers, i.e., an input layer, a hidden layer, and an output layer. Inputs to the MLP were F0 contours. Based on our previous studies on optimization of neural-network configurations for Mandarin Chinese tone recognition (Xu et al., 2006, 2007, Zhou et al., 2008), the number of inputs was set at 12. The F0 contour was evenly segmented into 12

51 51 parts, and the averaged F0 value for each part made up the inputs to the neural network. The number of neurons in the hidden layer was set at 16. The output layer of the neural network consisted of four neurons representing the four Mandarin Chinese tones. The Levenberg-Marquardt optimization was used as the training algorithm (Hagan & Menhaj, 1994). Our previous study (Xu et al., 2007) indicated that this neural network was very tolerant to between- or within-variation of F0 of children speakers. Therefore, no normalization procedure was necessary for adjusting the height of input F0s. Evaluation of tone production of children with cochlear implants with the neural network. The neural network was trained with all of the tone tokens from the 61 NH children. Training was stopped when the sum of squared errors became less than 0.01 Hz 2. Given the large amount of training data compared to the testing data, to avoid overfitting, the number of iterations for the training was set at 50. Over-fitting refers to the situation in which the neural network starts to learn patterns in the noise signal due to too many training iterations or too many hidden neurons used, which results in poor performance in real test conditions. The tone tokens of each CI child were tested with the neural network upon the completion of training with the tone tokens from the NH children. Half of the tone tokens of each CI child were tested 10 times with different randomization of inputs. A tone confusion matrix was used to describe the error patterns by the CI group. Results and Discussion The mean recognition rate for each CI child was obtained. The recognition rate ranged from 13.5% to 69.6% correct (41.1% ± 15.8% correct, mean ± SD). The averaged

52 52 recognition rate for tone 1 through 4 by the children with CIs were 66.4%, 27.4%, 23.5%, and 46.8% correct, respectively. Note that the level tone 1 was more accurately recognized than any of the contour tones, and the falling tone 4 was better recognized than the rising tone 2. The confusion matrix revealed that 30 to 40 percent of the time, the contour tones (i.e., tone 2, 3 and 4) produced by the children with CIs were recognized as tone 1 (see Figure 8). The same neural network was used for recognizing the tones of the 61 NH children in our recent work (Xu et al., 2007). The corresponding recognition rate for the four tones by the NH children were 91.3%, 88.5%, 71.7%, and 83.9% correct, respectively. A group of t tests indicated that the recognition rates for the NH children were significantly higher than those of children with CIs for all tones [t 1 (73) = 5.38, t 2 (73) = 13.87, t 3 (73) = 7.95, t 4 (73) = 6.28, all p < 0.001]. Figure 8. Tone recognition error patterns. The value in the cell of in row j and column k is the percent of times stimulus tone j was recognized as tone k (j=1, 2, 3, or 4; k=1, 2, 3, or 4). The gray scale in each cell reflects the value in it with reference to the color bar on the right.

53 53 Perceptual Study Methodology Eight NH adult native-mandarin-chinese speakers were recruited from Ohio University for the tone perception tests. The group included 7 female and 1 male adult listeners, with ages ranging from 26 to 43 years. A hearing test was performed for each adult listener to confirm that his/her pure tone thresholds were 20 db HL at octave frequencies from 250 to 8000 Hz. A custom graphical user interface was developed in MATLAB to present the tone tokens and to collect listeners responses. The tone perception tests were done in an IAC double-walled sound booth. All the tone tokens (i.e., = ) from both NH and CI groups were randomized and half of them (i.e., 6160 tokens) were presented at a comfortable level to the listeners via a circumaural headphone (Sennheiser, HD 265). The listeners were instructed to use a computer mouse to click on a button labeled 1, 2, 3, or 4 after each presentation of the stimulus to indicate the tone of the speech token that they had heard. Clicking on any of the buttons activated the presentation of the next tone token. It took approximately 5 to 6 hours for each listener to complete the tone perception test. Results and Discussion The tone perception scores were sorted out for each CI and NH child and the mean scores across all 8 adult listeners were used to represent the tone production accuracy. The percent correct scores for the children with CIs ranged from 17.4% to

54 78.3% correct (48.5% ± 19.2% correct, mean ± SD). Scores for the NH children ranged from 57.2% to 96.7% correct (79.9% ± 7.9% correct, mean ± SD), which was significantly higher than that of the CI group (t (73) = 9.8, p < 0.001). Percent correct scores varied for individual tones. They were 71.7%, 21.6%, 46.5%, and 55.2% correct for tone 1 through tone 4, respectively, for the children with CIs. For comparison, percent correct scores for individual tones produced by the NH children were 94.6%, 86.6%, 45.1%, and 93.2% correct for tone 1 through tone 4, respectively. Both groups showed patterns of better scores on the level tone 1 and falling tone 4. The rising tone 2 was perceived to be particularly poorly produced by the CI group. Consistent with the findings of the confusion matrix from the neural network, the confusion matrix from the perception tests also showed that the intended contour tones by the children with CIs were perceived as tone 1 about 30 to 40 percent of the time (see Figure 8). It is worth noting that the preferred choice of tone 1 by the adult listeners reflects the monotonic features of the production of these children. This suggests that they either attempt to manipulate pitch in vain or that the limited tone information they receive through the implants hinders them from developing satisfactory tone production. Except for the 20 percentage points higher perception score on tone 3, the error patterns from the perception tests were in accordance with those from the neural network analysis. Discussion and Conclusion Several approaches were proposed and used for evaluating Mandarin Chinese tone production by both NH and CI Mandarin-speaking children. The measures included three indices based on the acoustic analysis of the F0 onset-offset pair, contour correlation based on the examination of the patterns of F0 contours, and perception-like 54

55 55 recognition results based on the neural network analysis. These methods measure different aspects of tone production. The three indices evaluate the differentiation of tones. Indices 1 and 2 measure the overall differentiation of tones, while Index 3 breaks down to false detection probabilities of each tone category, thereby measuring the differentiation of one particular tone from the others. The two components of Index 1 provide further information of pitch use and pitch range of a speaker. The tonal ellipse size measures variation of F0 use for individual tones, while the tonal area measures the F0 span used for all tones. The coefficients (i.e., R values) of the contour correlation evaluate the degree of similarity between two F0 contours. While the above acoustic measures describe certain acoustic properties of tone production, the neural network provides direct classification results, from which the recognition percent correct scores as well as tone confusion matrix can be generated. These measures allow analysis of different components of tone production. Pearson s correlation was performed for any two approaches, including results of the perceptual tests. The results of the correlations showed that any pairs of the approaches were significantly correlated after controlling for the family wise type I error using Holm correction (i.e., Bonferroni step-down correction, family-wise alpha = 0.05, df = 12). In the order of level of correlation, neural-network recognition and perception test correlated at the highest level, followed by neural-network recognition with contour correlation, contour correlation with perception test, neural network with Indices 1, 2, and neural network, and perception test with Index 3 (see Figure 9).

56 56 The differences in the level of correlation are related to the nature of the analysis each measure entails. The three indices using the F0 onset-offset scatter plot mainly concern the level and slope of the F0 contour, whereas the contour correlation examines the patterns of the F0 contour. Contour correlation uses all the available samples from an F0 contour; it showed greater consistency with human perception and the neural network than the indices. Regarding the strength of correlation to the perception test, the neural network correlated better with the performance of human listeners than the contour correlation. Given the nature of the input to the neural network, the neural network takes into account both dimensions (i.e., endpoints and pattern) of the F0 contours in the classification.

57 57 Figure 9. Correlation among measures. Top panels: Pearson s linear correlation between the perceptual scores and the three indices; Middle panels: Pearson s linear correlation between the scores of neural network and the three indices; Bottom panels: Pearson s linear correlation between the perceptual scores and the scores of the neural network, contour correlation and the perceptual scores, and the scores of neural network and the contour correlation. Each symbol represents one CI child. The solid line represents the linear fit of the data in each panel. Since perception is also likely to involve the use of both level and contour cues, it is not surprising to see that the performance of the neural network has a higher correlation with the perceptual performance of human listeners than with the other

58 58 approaches. Even though we consider auditory judgments by adult listeners to be the fairest test of tone production, we do not intend to imply that the usefulness of the other measures should be scaled on the strength of their correlation with the perception test. These measures emphasize different aspects of tone production whereas perception involves the use of weighted total of these aspects. We should also use caution when interpreting the correlations because the results of the perception tests may be favored by other cues such as duration, a variable that was not examined in other measures. These measures were quite successful and consistent with each other in scoring individual children with CI. As for the averaged scores for different tone categories, again, the results of the neural network and the perception tests were the closest. It was shown that tone 1 and 4 were the relatively better-produced tones. Confusion matrices of the two approaches also closely matched except for tone 3. The human ear more accurately identified tone 3 than the neural network. This probably occurred because tone 3 has quite a reliable duration cue that was available to human ears but was normalized in the neural network input. The contour correlation also indicated that tone 4 was the bestproduced tone albeit failed to evaluate tone 1. The production of tone 1, however, was better estimated with the false detection probabilities. In fact, it was least likely to be falsely recognized than the other tones. Each measure has its strength and limitations. In general, the neural network demonstrated performance comparable to that of human listeners with a high correlation of percent correct scores for individuals and very similar tone confusions. This occurs because the neural network works on a complex mathematical model with high error

59 59 tolerances that emulate a biological perception system (Arbib, 1995; Bishop, 1995). Therefore, it is ideal to use neural network to obtain perception like classification results. Nonetheless, the underlying process of the function approximation that the neural network relies on is hardly transparent. On the contrary, the acoustic analysis built on the examination of the level and pattern of the F0 contours provides more straightforward accounts for tone production performance. More importantly, the acoustic analysis offers more detailed and useful information about the acoustic properties of tone production than do either the perceptual judgments or neural networks. The acoustic analysis is particularly useful for studying children s development of tone production. The tonal ellipses visualize the tonal space and pitch used for individual tones by a particular speaker. The indices can also quantify these parameters and evaluate the degree of differentiation among tones. Tonal ellipse analysis has particular merit. It greatly simplifies the analysis using just the onset and offset points of an F0 contour. Nonetheless, this simplification, as pointed out, prevents this method from giving a comprehensive representation of the particular contour tones. The contour correlation is a useful complementary analysis in addition to the tonal ellipses. It quantifies the similarity of one tone contour to a standard form. Thus, the contour tones of Mandarin Chinese can be more adequately evaluated using this method. In conclusion, the proposed methods can be used to evaluate effectively the tone production of children with CIs. The proposed methods can also be used to evaluate tone production of hearing impaired people, or to study tone acquisition in developmental research. They all focus on, however, different aspects of the F0s. It will be interesting in

60 future studies to address aspects other than F0s (e.g., duration) in the assessment of tone production. 60

61 CHAPTER 3: LEXICAL TONE DEVELOPMENT IN CHILDREN WITH COCHLEAR IMPLANTS Introduction Cochlear implants have successfully provided speech perception for profoundly deaf patients. Recent research has addressed several remaining challenges for modern cochlear implant technology. In particular, current cochlear implants fall short in the explicit coding of F0 (see Moore, 2003, for a review). The speech processing strategies commonly used in modern CIs involve the analysis of speech signals in a number of frequency channels. The temporal envelopes of these frequency channels are extracted to modulate electrical pulse trains, which are then delivered to the corresponding electrodes. Given the low frequency selectivity provided by 12 to 22 electrode arrays, F0 cannot be resolved. Place pitch of cochlear implants is thus limited. Temporal pitch is further constrained by the ability of the implant users to extract temporal modulations (e.g., Zeng, 2002). Because of the limited transmission of F0 in current devices, cochlear implant users continue to have difficulties in music perception (e.g., Fujita & Ito, 1999; Galvin et al., 2007; Gfeller et al., 2002, 2006, 2007; Leal et al., 2003; Looi et al., 2008) and in vocal singing (e.g., Xu et al., 2009). In addition, users who speak tone languages are challenged when attempting to capture pitch variation in lexical tones. Tone perception in children with CIs who speak tone languages have only been reported for small samples. Children s performance is highly variable with a range of deficits in tone perception. Data from the Mandarin-speaking as well as Cantonesespeaking children with CIs generally indicate that the prelingually deafened children with 61

62 62 CIs have a deficit in perceiving tones in tone languages. Wei et al. (2000), in their rehabilitation study, measured tone perception accuracy in 28 Cantonese-speaking children who were 2-12 years old. Tone perception improved significantly after implantation, but performance plateaued at 65% after training for 2 years. Ciocca et al. (2002) tested tone perception in 17 native Cantonese-speaking children aged 4-9 years old. The subjects identified tones in a monosyllabic target /ji/ in a two-alternative forced choice test. Performance ranged from chance (i.e., 50%) to 61% for the eight tone contrasts. The group mean of three contrasts (i.e., high level median level, high level low level, and high level high rising) out of the eight was found to be significantly above chance, but individually, only a few subjects scored significantly higher than chance on those contrasts. Lee et al. (2002) tested three target Cantonese tones including the high rising, high level, and low falling tones in 15 child CI subjects. The three tones were presented in three contrasts, each using an identical segmental structure. The subjects were able to correctly identify the high level vs. high rising, high level vs. low falling, and high rising vs. low falling 53%, 695, and 68% percent of the time, respectively. It seems that the tones that contrast in contour were more poorly identified than those that contrast in level or both contour and level. Wong and Wong (2004) measured tone identification and discrimination in 17 children with CIs between 4-9 years of age. The tone discrimination task required the subjects to judge whether the two tones were the same or different, while the tone identification task required the children to identify the target tone among four choices. The group mean performance of the discrimination test was 59%. Thirteen of the 17 subjects were able to discriminate the

63 63 Cantonese tones significantly above chance. The mean score on the identification test was 31.2%, which was slightly above chance (25%). Individually, 10 of the 17 subjects achieved identification scores significantly higher than chance. Peng et al. (2004) reported a relatively higher average score of approximately 73% for a group of 30 implanted children years old who spoke Mandarin Chinese (chance = 50%). Among the 6 Mandarin Chinese tone contrasts, Peng et al. reported that implanted children could identify the tone contrasts that involve tone 4 better than with the other contrasts. Accompanying the difficulties in perceiving tones, implant users have also experienced difficulty in tone production, probably due to the lack of accurate auditory input of the tonal targets. Tone production in tone-speaking children with CIs has been investigated using different methods (Barry & Blamey, 2004; Han et al., 2007; Peng et al., 2004; Xu et al., 2004; Zhou & Xu, 2008a). Children with CIs seemed to perform at highly variable levels either in perception or production tasks. Previous studies have reported mixed findings on demographic predictors for tone development in children with CIs. Lee et al. (2002) reported that tone perception performance was related to the duration of CI use and age at implantation. Han et al. (2009) found a consistent relationship between tone perception performance and age at implantation. Nonetheless, Peng et al. (2004) and Wong and Wong (2004) did not find tone perception performance to correlate with any of the candidate predictors. As for tone production predictors, Han et al. (2007) showed that tone production was correlated with age at implantation as also reported by Peng et al. (2004). Han et al. (2007) also found that duration of CI use was correlated with tone production, a finding

64 64 not supported by the data reported by Peng et al. (2004). The limited sample sizes in these studies probably explain the discrepancies. There are studies that indicate that tone perception with CIs could be related to speech processing strategies. Fu, Hsu and Horng (2004) speculated that the high stimulation rate provided by ACE and CIS could potentially increase the temporal sampling resolution of the signal, whereas the SPEAK strategy (typically using a 250-pps rate) could not adequately sample the voice periodicity. The advantage of the high-rate stimulation strategies for tone perception, however, was not found in Cantonese-speaking children with implantation (Barry et al., 2002). Based on the psychophysical evidence that simultaneous stimulation of two adjacent electrodes could result in pitch percepts distinct from those elicited by stimulation of the two electrodes individually (see Bonham & Litvak, 2008 for a review), Advanced Bionics introduced HiResolution with Fidelity 120 (HiRes 120) in Built on its predecessor, the 16-channel HiRes strategy, HiRes 120 uses current steering to create multiple virtual spectral channels (120 virtual channels) through simultaneous stimulation of adjacent electrodes. Han et al. (2009) studied whether HiRes 120, which presumably provides much finer spectral resolution than the traditional strategies, would benefit lexical tone perception. Eight of the 20 subjects significantly improved their tone perception performance after wearing the new strategy for 3 months and some for 6 months. The purpose of this study was to investigate both tone perception and production in a large sample of Mandarin-speaking children with CIs. We hypothesized that Mandarin-speaking children with CIs will show tone perception and production deficits

65 65 to different degrees. Further, tone perception and production would be correlated with each other. More importantly, the large sample base of the study allowed us to perform a multivariable regression analysis to identify significant predictors for tone development among demographic, device and educational variables. Methods Subjects Prelingually-deafened children with CIs were recruited from three CI rehabilitation centers in Beijing, China. In total, 110 children with CIs participated in our study. Among the 110 CI subjects, 107 children completed the tone perception test and 76 completed the tone production test. We were able to collect demographic information for 68 children with CIs [age: 5.23 ± 3.13 yrs; age at implantation: 3.96 ± 2.70 yrs; duration of CI use: 1.27 ± 1.03 yrs]. As controls, 125 typically developing NH children were recruited also from Beijing, China [age: 6.74 ± 1.79 yrs]. The detailed information for both groups is listed in Table 2. We were able to obtain tone production samples from all the NH children, while only 112 of them completed the tone perception test. The use of human participants protocol was reviewed and approved by the Institutional Review Board of Ohio University. Tone Perception Test There are four lexical tones in Mandarin Chinese, resulting in 6 possible tone contrasts (i.e., 1 vs. 2, 1 vs. 3, 1 vs. 4, 2 vs. 3, 2 vs. 4, and 3 vs. 4). The tone perception test involved identifying tones from six tone contrasts that followed the procedure used in Han et al. (2009). There were three pairs of words provided for each tone contrast, of

66 which the children were allowed to pick one pair that was the most familiar to them. The tone perception test used a two-alternative forced choice paradigm. 66 Table 2 Chronological Ages for the Cochlear Implant and Normal Hearing Groups 2-3 (yrs) 3-4 (yrs) 4-5 (yrs) 5-6 (yrs) 6-7 (yrs) 7-8 (yrs) >8 (yrs) total CI 17.65% 36.76% 8.82% 7.35% 7.35% 10.29% 11.76% 100% NH % 16.00% 15.20% 15.20% 16.80% 28.80% 100% In each presentation, a tone token was played via a loudspeaker while two pictures of the tone contrast being tested were shown on the computer screen. Children were instructed to point to the picture that corresponded to the meaning of the word that they heard. The testing consisted of 48 presentations [12 words (i.e., 6 tone contrasts) 2 speakers 2 repetitions] without feedback. Training was provided before the test with feedback. Tone Production Test Speech recording. The tone production test used a picture-naming procedure. The same 36 words chosen for the tone perception test were the target of tone production. To elicit the tone production of the subject, a picture was presented on the computer screen, the children were asked to describe the meaning of the picture. The researcher was not

67 67 allowed to say the word for participants to mimic. All production was spontaneously elicited speech. The elicited speech of participants was recorded at a sampling rate of Hz with a 16-bit resolution in quiet rooms. Acoustic and perceptual analysis. The acoustic analysis of the tone production followed the tonal ellipse analysis that was discussed in Chapter 2. The acoustic characteristics of the tones produced were quantified by three indices. A feedforward back-propagation multilayer perceptron (MLP) was used to recognize the tones produced by the subjects. The neural network was structured to have three layers (i.e., input, hidden, and output layer). Input to the neural network was 12 averaged F0s of the 12 parts evenly segmented from the F0 contour. The number of hidden neurons was set at 16. The neural network was training with all the tone tokens from the NH children. Tone tokens produced by each child with CI were tested by the neural network. Testing was repeated 10 times for each subject. An averaged percent correct score was derived for each subject. The neural network analysis was repeated for the NH subjects as control. Tone production of the NH children and children with CIs were judged by native speakers of Mandarin Chinese. Four NH native speakers of Mandarin Chinese participated in the evaluation. They were screened for normal hearing of thresholds < 20 db at octave frequencies from Hz. All tone tokens to be presented to the listeners were low-passed at 600 Hz with a 4 th order Butterworth filter (24 db/octave). The segmental information was removed through the lowpass filtering (also see Wong, Schwartz, & Jenkins, 2005), thereby reducing any bias possibly caused by the syllable itself on tone identification. The processed tone tokens from the deaf and NH speakers

68 68 were mixed and were presented in blocks, one speaker at a time. The order of presentations was fully randomized. A four-alternative one-forced choice paradigm was used in which the listeners were instructed to determine which tone they heard from four possible choices. The tone stimuli were presented to the listeners via a circumaural headphone bilaterally in a sound booth. A percent correct score was obtained for each child. Interjudge reliability was measured by correlating the perceptual scores between any two of the four listeners. Results Tone perception scores of the NH children (N = 112) and children with CIs (N = 107) were rank ordered and are shown in Figure 10. Performance of children with CIs ranged from around chance to perfect (mean ± SD: ± % correct). Performance of the NH children was less variable (mean ± SD: ± 2.67 % correct) and was significantly better than that of the children with CIs [t (217) = , p < 0.001]. For the CI group, percent correct scores for the 6 tone contrasts, in the order of tones 1-2, 1-3, 1-4, 2-3, 2-4, 3-4, were 69.90%, 68.2%, 69.54%, 63.15%, 63.59%, 67.72%, respectively. No statistical differences were found between the performance of the six contrasts [F (5,530) = 1.89, p = 0.09]. Therefore, a confusion matrix (see Figure 11, left) was derived to summarize the overall error patterns from the pooled group data.

69 69 Figure 10. Rank-ordered tone perception scores. Rank-ordered tone perception scores (% correct) by the 112 normal hearing children and 107 children with cochlear implants. Figure 11. Confusion matrices. Error patterns of children with cochlear implants in their tone perception and tone production. The error patterns of the tone production were based on the recognition results from the native Mandarin-speaking adult listeners.

70 70 Examples of tonal ellipses of four NH children and four children with CIs are shown in Figure 12. Four children from each group were pseudo-randomly selected from the four quartiles (i.e., 0-25, 25-50, 50-75, and th percentiles) of tone production accuracy scores judged by adult native Mandarin-speaking listeners. The degree of differentiation between the produced tones is reflected by the size and overlap of the tonal ellipses. Sizes of the tonal ellipses reflect the variability in the use of the F0s when producing the tone contours, whereas the degree of overlap between the ellipses indicates how F0 distributions of the four tones are separable in the tonal space. Accuracy of tone production from the four NH children was all above 90% correct as judged by native listeners. NH children differed from the CI group because their ellipses were well separated in the tonal space regardless of the sizes. Even though subject CI 1 had very small-sized ellipses, the ellipses were nonetheless highly overlapped, giving rise to a low degree of differentiation. In fact, the small size and large overlap of the ellipses indicated that the tones produced by this subject did not have any variations in F0. Subject CI 9 and CI 10 achieved relatively better scores because the diffused ellipses were compensated by their overlap.

71 71 Figure 12. Tonal ellipses of four normal hearing children and four children with cochlear implants. The four children from each group was randomly selected from those with tone production scores (judged by the native adult listeners) in the ranges of 0-25, 25-50, 50-75, and percentiles of each respective group. Tonal ellipses were generated for all NH children (N = 125) and children with CIs (N = 76). Significant differences were found between the CI group and the NH group on all three indices (all p < 0.001) (see Figure 13). Index 1 = Aq Ae, Index 2 = Ave Ave Dist., Ax 1+ 2 and Index = 1 ( Ave P ). 3 fd For both groups, the values of the three Indices were highly correlated with each other (all p < 0.05 after Bonferroni correction). Moreover, the average between-ellipse distance of the NH group was eight times greater than that of the CI group. As revealed by an independent t tests, the degree of separation of the tonal ellipses from the NH children was significantly larger than that of the CI group [t (199) = 11.79, p < 0.001].

72 72 The average ellipse size of the NH group was significantly smaller than the CI group [t (199) = -3.26, p = 0.001]. Figure 13. Box plots of the indices of the acoustic analysis. Each box depicts the lower quartile, median, and upper quartile. The whiskers show the range of the rest of the data. The outliers plotted with filled circles. Tone production of both groups was recognized by four native Mandarinspeaking listeners. Percent correct scores are shown in Figure 14 (right column). Interrater reliability was measured by the correlations of the perceptual results produced any two of the listeners. Highly significant correlations (all r > 0.95, p < 0.001) between the perceptual results indicated that the perceptual judgment from the four listeners agreed highly with each other. The recognition scores from the four listeners were therefore averaged to represent the tone production accuracy for each child speaker. Based on the averaged recognition scores by the native listeners, tone production accuracy of the

73 73 children with CIs ranged from 18.75% to 95.14% (mean ± SD: ± % correct). Tone production accuracy of the NH children was in the range of % to 100 % (mean ± SD: ± 9.02% correct). The group differences were found to be highly significant [t (199) = , p < 0.001]. Tone production was also recognized by an artificial neural network (see Figure 14, left column). The recognition scores from the neural network were found to be highly correlated with that of the native listeners (see Figure 15). Figure 14. Box plots of the tone production. Box plots of the tone production evaluated by the neural network and adult listeners. Each box depicts the lower quartile, median, and upper quartile. The whiskers show the range of the rest of the data. The outliers plotted with filled circles.

74 74 Figure 15. Correlation between the recognition scores. Correlation between the recognition scores (% correct) by the neural network and adult listeners. Relationship between tone perception and production. The error patterns in the tone perception and production of children with CIs are summarized in Figure 11. Tone 1 was perceived most accurately among the four tones by the children with CIs. Tone 1 was produced most accurately by the CI subjects. Different from the error patterns in perception, the rising tone (i.e., tone 2) and the contour tone (i.e., tone 3) were the most difficult for the children with CIs to produce. Even though the error patterns in perception and production were rather different, the overall accuracy rates of tone perception and production performed by the children with CIs were significantly correlated (r = 0.56, p < 0.001) (see Figure 16).

75 75 Figure 16. Correlation between tone perception and production. Predictor variables to tone perception and production performances. A number of predictor variables were studied as potential contributors to tone perception and production performance in children with CIs. The variables included demographic variables (e.g., age at implantation, duration of CI use), family variables (e.g., family size and household income), CI variables (e.g., implant type and speech processing strategy), and educational variables (e.g., communication mode and duration of speech therapy). The outcome variables were percent correct scores achieved on the tone perception and tone production tests, the latter revealed by native listeners recognition scores. The predictor variables were entered step-wise into a linear regression model for analysis. Duration of CI use and age at implantation were the two significant predictor variables for tone perception. The marginal correlations between the two predictor variables and tone perception outcome are shown in Figure 17. Although age at implantation was marginally correlated with the perception outcome, when entered with

76 76 duration of CI use, it explained a significant amount of unique variance in tone perception (p = 0.003). Jointly they explained 53.5% of the total variance for tone perception outcome. Age at implantation was found to be the only significant predictor for tone production performance (see Figure 18). Figure 17. Contributing factors for tone perception: Correlations between duration of cochlear implant use, age at implantation and tone perception performance in children with cochlear implants. Figure 18. Contributing factors for tone production: Correlations between age at implantation and tone production in children with cochlear implants.

77 77 Discussion and Conclusion Tone perception performance of the children with CIs showed large variability ranging from chance level to 100% accuracy. Performance by the NH group was much less variable and was significantly better. The NH group, however, was an older group (see Table 2) than the implanted children, whose age was 5.23 years on average and had only used CI for 1.27 years. Presumably the NH children have already acquired tones, whereas the CI group could still be learning them. Among the six tone contrasts, whenever tone 1 was contrasted with others, the children with CIs could perform relatively well. This was also revealed by the pattern of the confusion matrices of perception (see Figure 11) in which recognition of tone 1 was superior. In contrast, Peng et al. (2004) reported that tone 4 was the most easily identified tone in their tone contrast test. This difference could be due to the fact that different syllables were used, but more importantly, in Peng et al. the stimuli were presented via live voices without controlling for the sound level or duration. Tone 4 has a distinctive duration cue (i.e., the shortest), which might explain why tone 4 was the best recognized tone presented via live voices as described in Peng et al. (2004), but not in the presented study, in which duration cues of the stimuli were controlled. Tone production of the children was analyzed acoustically, automatically recognized by a neural network, and evaluated by native Mandarin-speaking listeners. The acoustic analysis used tonal ellipses for quantification of the degree of differentiation between the produced tones. Essentially, the Indices of the acoustic analysis took into consideration two factors, including the variability of F0 use that is associated with the

78 78 tonal ellipse size, and the separation of the F0 distribution that is reflected by the overlap between the tonal ellipses. The tonal ellipse sizes were significantly smaller for the NH group than the CI group. In contrast to the previous studies that reported even larger ellipse size for the normal group as compared to the CI group (Barry & Blamey, 2004), the NH group in this study was able to use F0 more consistently for each tone than the CI group. As noted before, the NH participants in the present study were older, a majority of whom had acquired tones. Evidence from Zhou and Xu (2008a), examining tone development of NH children, showed that tonal ellipses decrease with age. The relatively large ellipse sizes of the CI group indicated that the F0 produced by the children with CI were variable overall. This means that the children with CIs were, in fact, able to produce pitch changes. However, the pitch variations that these children produced overlapped between the four targets, which was supported by the fact that the separation of the ellipses of the CI group was eight times poorer than the NH group. Although some individuals with CIs may have problems producing F0 variations in their speech, as a group they had more difficulties with using the right and differentiable F0 contours consistently for the tone targets. Not surprisingly, the two groups of children were found to be significantly different on all three indices. The absolute values of the indices of the CI group were consistent with that reported in Zhou and Xu (2008a). While the acoustic properties of tone production were measured by tonal ellipses as an objective measure, the overall accuracy in tone production was evaluated by a neural network and by native listeners. The objective measure by the neural network and the subjective perceptual measure were found to be highly correlated (see Figure 15).

79 79 This further confirms that the neural network can be used as an automatic recognition tool for tone production assessment in children with CIs in future clinical practice as suggested by our previous studies (Xu et al., 2006; Zhou et al., 2008). Furthermore, children with CIs seemed to have particular difficulties with producing the rising tone, tone 2. Peng et al. (2004) and Han et al. (2007) also described that the production of tone 2, which requires effortful manipulation of vocal cords, was the most difficult for the implanted children. This finding was consistent with a more recent study that examined the ability of implanted children to produce rising intonations in English (Peng, Tomblin, Spencer, & Hurting, 2007). In the present study, the overall error patterns in tone production seemed to suggest that, the implanted children tended to neutralize the tone patterns. The confusion matrices (see Figure 11, right panel) showed that thirty to forty percent of the time, the intended contour tones (i.e., tones other than tone 1) were perceived by the listeners as tone 1. This error pattern was similar to that reported in our previous study on tone production in children with CIs (Han et al., 2007). The errors that the implanted children made in their tone perception and production did not seem to be the same. The implanted children seemed to have particular difficulties with producing tone 2. The accuracy of producing tone 2 was only 26.5%. This difficulty is probably not attributable to the ability to perceive this tone (an accuracy of 69.1%), but rather to the efforts needed in producing the rising pitch. The ability of perception, similarly, cannot fully explain the difficulties experienced by children with CIs when producing tone 3, which has the most sophisticated pitch contour. Before further discussion, it should be noted that the confusion of perception was

80 80 generated from a tone contrast test, which would overestimate the overall accuracies compared to the production confusion. The lack of accurate pitch representation in the degraded signals that led to overall poor accuracy in perception, however, may be one of the reasons why the tones produced by these children were considerably neutralized in pitch. Their neutralized tones were quite often perceived as flat tones by native adult listeners. It remains to be tested whether a lack of the auditory feedback, i.e. poor perception, was the primary cause of poor performance in production, or alternatively, whether the production of some of the Mandarin tones was a physical challenge for these children. Interestingly, although perception of individual tones did not seem to predict the children s production performance, the perception accuracy predicted their overall production performance, and vice versa. This correlation suggested that development in tone perception and production in children with CIs are two related aspects of language that facilitate each other. Previous studies that examined the factors of age at implantation and duration of CI use in tone development of children with CIs have produced mixed results. For tone perception, Peng et al. (2004) found that duration of CI use was a significant predictive factor only for the Nucleus users. Han et al. (2009) reported age at implantation to be a significant factor for tone perception performance measured at different time intervals in 20 CII or 90K users. Lee et al. (2002), however, reported that both factors were related to tone perception performance. The results of the regression analysis of the present study, which measured a much larger sample size (N = 67) indicated that both factors were

81 81 significant predictors for tone perception. Duration of CI use was a stronger predictor, for it had a significant marginal relationship with tone perception. Age at implantation alone, however, could not explain a significant amount of the total variance in tone perception. Therefore, the marginal relationship was not significant. When both variables were entered into the regression model, in the presence of duration of CI use, age at implantation explained a significant amount of unique variance in tone perception. The results indicate that although the performance of perception is bound to improve as the experience of using the device increases, early implantation may contribute to this improvement. As for tone production, age at implantation was consistently identified in the previous studies as a significant factor (Peng et al., 2004; Han et al., 2007). Implantation age was also found to be a significant predictor for tone production in the present study. Although duration of CI use was suggested by Han et al. (2007) as another significant predictor of tone production ability, age at implantation was found to be the only factor related to tone production in the present study. Taken together, the results of the present study indicate that the children with CIs are expected to perceive tones better as the duration of using the device increases, or if they are implanted early. For production, however, longer duration of CI use predicts better performance as much as early implantation.

82 82 CHAPTER 4: EFFECTS OF FREQUENCY-PLACE MISMATCH ON SPEECH PERCEPTION WITH COCHLEAR IMPLANTS Introduction The ability to recognize speech is surprisingly robust and resistant to many forms of distortion or reduction of information (e.g., Baer and Moore, 1993). With the channel vocoder, it has been shown that good speech recognition can be maintain in quiet even when spectral resolution was degrading to the level available in as few as four frequency channels (Shannon et al., 1995). However, speech recognition is severely affected if speech information in the channels is not matched tonotopically to the cochlear place (e.g., Dorman, Loizou, and Rainey, 1997; Fu & Shannon, 1999). In cochlear implant systems, a number of forms of frequency-place mismatch are found in implant patients that probably result from either the pathologies of hearing loss, shallow insertions, or the frequency mapping of the device. One type of frequency-place mismatch happens when patients have localized losses of auditory neurons, which result in holes in hearing. In this case, elevated electrical thresholds of the corresponding electrodes are often required for those bands of information to be received. The increased signal level will probably result in spread of electric current to neural fibers that are not intended to be activated, producing frequency warping around the functional gaps in the cochlea. Frequency-place warping has been studied using acoustical simulations with noise-excited vocoders and in cochlear implant patients. Shannon and colleagues (Shannon, Galvin, & Baskent, 2002; Baskent & Shannon, 2006) reported that holes in the apical regions appeared to affect speech recognition more than holes in either the middle

83 83 or basal regions. Further, redistribution of the missing frequency content around the holes did not improve speech recognition. Another type of frequency-place mismatch involves an overall shift of spectrum as a result of shallow insertion of a cochlear implant. Consider the case where the implant electrode array is not fully inserted into the cochlea so that the electrodes do not match the places corresponding to the frequency map of the speech processor. Typically, the output of a low frequency analysis channel is delivered to the electrode that rests at a higher frequency place, resulting in a basal shift of the spectrum. Dorman et al. (1997) used a 5-channel tone vocoder to simulate various insertion depths of a cochlear implant to produce varying degrees of basal shift. They found that the basal shift of spectrum progressively deteriorated the recognition of sentences and vowels. Consonant recognition, in contrast, was less affected by the shift. The feature of place of articulation of consonants, however, was transmitted particularly poorly (Dorman et al. 1997). In one of the experiments by Shannon, Zeng, and Wygonski (1998), an 8-mm basal shift was simulated alone to be compared with a full insertion condition in a 4-channel noiseexcited vocoder. Vowel recognition accuracy was significantly reduced and sentence recognition accuracy almost became zero. Fu and Shannon (1999) further investigated the effects of frequency-place mismatch on vowel recognition with 4-, 8-, or 16-channel processors. They first fixed the analysis bands of the input signal while changing the simulated insertion depth of the implant. They found that vowel recognition scores decreased significantly when the tonotopic places of the carrier bands were shifted by 3 mm or more. In their second experiment, Fu and Shannon (1999) simulated a fixed

84 84 electrode array positioned at two relatively shallow places and varied the frequency allocation of the analysis bands. They found that the best performance occurred when the analysis bands transmitted more low-frequency information with a small amount of mismatch to the carrier bands. The effects of mismatch between the analysis band frequencies and the electrode locations have also been evaluated in patients with cochlear implants (Fu & Shannon, 1999). Similar results were found. That is, the best performance struck a balance between the loss of low-frequency coverage as a result of the basal shift of analysis bands and the degree of the mismatch between the analysis and carrier bands. To avoid frequency-place shift, shallow-inserted electrodes have to be mapped with tonotopically matched speech content. Faulkner, Rosen, & Stanton (2003) showed that in cases of shallow insertion, tonotopic mapping does not necessarily recover speech recognition, because a large part of low frequency speech content is lost. However, the acute effects of spectral shift can be compensated to a certain extent by training (Faulkner, 2006; Fu & Galvin, 2003; Fu, Shannon, Galvin III, 2002; Rosen, Faulkner, & Wilkinson, 1999). That is, human brains are able to learn and adapt to the frequencyshifted speech signals. Other studies investigated the effects of compression of the entire speech spectrum to the limited stimulation range of the electrode array that typically results from shallow insertion (e.g., Baskent and Shannon 2003, 2004, 2005). In their 2005 study, the most apical electrode was turned off consecutively to simulate partially inserted implants, while the analysis frequency range was either held constant or reduced while mapping the stimulation frequency range. The compressed stimulation avoided truncating low

85 85 frequency range commonly seen in partially inserted implants, but did result in a distorted frequency-place mapping. Baskent and Shannon (2005) found that while tonotopic mapping started to lose its advantage when more apical electrodes were turned off, compressing a wider analysis frequency into the few remaining basal electrodes contributed to better speech recognition. To summarize, in the first section of this chapter, we adopted a noise-excited vocoder to simulate the effects of two forms of spectral distortion on Mandarin-Chinese tone recognition. We examined the acute effects of basal spectral shift and frequency compression in an attempt to provide reasonable implications for CIs. Previously, recognition of English phonemes, words, or sentences has been studied in terms of its relationship with spectral shift (e.g., Dorman et al. 1997; Fu and Shannon 1999; Shannon et al. 1998) or spectral distortion (e.g., Baskent and Shannon 2003, 2004, 2005). The effect of spectral distortion on lexical tone recognition is a topic that has not been closely studied. To our knowledge, a study to simulate these effects on lexical tone recognition using normal-hearing subjects has not been reported. The two studies that have examined the effects of frequency compression on tone recognition in cochlear implant patients reported mixed findings. Chu, Au, Hui, Chow, and Wei (2005) tested Cantonese tone recognition in three patients implanted with short electrodes compressively assigned with a normal speech spectrum. In comparison with performances in tonotopically matched condition, they found no effect of compression. Liu, Chen, and Lin (2004), however, showed that Mandarin tone recognition in six patients decreased as a result of limiting the number of active electrodes. The reason for this discrepancy is not clear, as neither study

86 86 described the frequency range allocated to the electrodes. Further, there are many factors other than the variable under study that could influence the performance in those CI patients, for example, etiology, neural survival, implant device, and speech processing strategy, current spread. Lexical tone is quite different from English phonemes in terms of their respective recognition mechanisms. The most important acoustic feature for Mandarin Chinese tones is the fundamental frequency (F0). Temporal information that covaries with F0 contours, including vowel duration and envelope amplitude, also contributes as a secondary cue (e.g., Whalen & Xu, 1992). Compared with English phoneme recognition, vocoder studies have shown that Mandarin tone recognition saturates with higher frequency resolution (Xu et al., 2002, 2005). Only when very detailed spectral resolution is provided by more than 30 spectral channels does the tone recognition performance improve to close to that of unprocessed stimuli (Kong & Zeng, 2006). Kong and Zeng also found that 500 Hz envelope information carried by only one channel provided better tone recognition in quiet than 8 channels with 50 Hz envelope information. Evidence from these studies showed that, when place pitch cannot be resolved, tone recognition depends on temporal envelope information that contains the periodicity more than English phoneme recognition does (see Xu & Pfingst [2008] for a review). It has been suggested that speech processing strategies that use high stimulation rates favor tone recognition, probably because more detailed temporal envelope information can be represented with higher stimulation rates. Given these differences, spectral resolution and frequency-place mismatch may exert different effects on Mandarin Chinese tone

87 87 recognition than English phoneme recognition. We hypothesized that tone recognition performance should be overall lower than English phoneme recognition due to low spectral resolution, and that it should be less affected by spectral distortion as English phoneme recognition due to the use of temporal cues. A majority of the studies that examined consonant recognition have suggested degradation in performance as a result of spectral shift, though the effects are smaller than those for vowels and sentences. Dorman et al. (1997) reported that consonant recognition scores underwent smaller decreases compared to vowel and sentence recognition, but the decrease was still found to be significant with a shift of 2 mm or more. Consistent with the findings by Dorman et al. (1997), Rosen et al. (1999) reported significantly reduced consonant recognition performance as a result of basal shift, although the decrease was smaller than that for vowels and sentences. Rosen et al. (1999) reasoned that consonants can be recognized using temporal information and gross spectral contrast, which renders consonants more immune to frequency-place shifts. Shannon et al. (1998) found that while an 8-mm basal shift caused vowel recognition to greatly degrade, and sentence recognition accuracy essentially to drop to zero, the performance of consonant recognition was nearly intact. In the second section of this chapter, spectral resolution (i.e., number of channels) and spectral shift (i.e., frequency-place shift) was examined for its effect on consonant recognition and consonant features. We assumed that distortion of the frequency spectrum and varying spectral resolution could have a differential effect on information transmission of different articulatory features of consonants. Data were analyzed in terms

88 88 of detailed articulatory features within the broad categories of manner, place, and voicing, which have not been reported in previous studies. It was hypothesized that features transmitted using predominantly temporal information will be less affected by spectral resolution or spectral shift. On the other hand, features that rely more on spectral information will be increasingly affected as the amount of spectral shift increases or the spectral resolution becomes poorer. Further, none of the previous studies have reported consonant error patterns in conditions that varied the degree of spectral resolution and the amount of spectral shift combined. The primary question addressed was whether consonant confusions would vary in systematic patterns with spectral resolution and spectral shift. Frequency-Place Mismatch and Tone Perception Methods Speech material and signal processing. One female and one male native Mandarin-Chinese speaker produced the following ten syllables in each of the four tones: /fu/, /ji/, /ma/, /qi/, /wan/, /xi/, /xian/, /yan/, /yang/, and /yi/. Care was taken so that the four tones produced for the same syllable were equal in duration (see Xu et al., 2002). The rms values of all tone tokens were equalized to control for the overall loudness differences. The amplitude contour cue was intact. The recordings of the 80 tone tokens (2 speakers 10 syllables 4 tones) were stored at a sampling rate of Hz with a 16-bit resolution. Signal processing for the acoustic simulations was performed in MATLAB (MathWorks, Natick, MA). Experiment 1 measured tone recognition in basal shift conditions. The basal shifts of the spectrum simulated an implant with varying insertion

89 89 depths (see Figure 19A). The raw tone tokens were pre-emphasized by high-pass filtering at 1200 Hz and were bandpass filtered into 4, 8, 12, or 16 frequency bands. The frequency range of the analysis bands was Hz, corresponding to a tonotopic place between 28 mm and 13 mm from the basal end of the cochlea. The bandwidth and corner frequencies of the analysis bands were determined using the formula from Greenwood (1990) that estimates equal spacing on the basilar membrane of the cochlea, F = 165.4( x - 1), where x is the distance in mm of the estimated place from the apex end of the cochlea, assuming the length of the basilar membrane to be 35 mm. The temporal envelope of each band was extracted by half-wave rectification and then lowpass filtering at 160 Hz (2nd order Butterworth, 12 db/octave). The temporal envelope was used to modulate a wideband noise. The modulated signal was then frequencylimited in what is thereafter referred to as the carrier band. The frequency allocation of the carrier bands was varied to simulate various insertion depths of the electrode array. The estimation of the frequency allocation of the carrier bands was also based on the Greenwood formula (1990). The analysis bands and the carrier bands did not necessarily match, which resulted in a tonotopic shift. Matched analysis bands and carrier bands simulated a fully inserted electrode array. The electrode location was manipulated to shift from full insertion (i.e., 28 mm) to 21 mm into the cochlea in a step size of 1 mm. The frequency allocation for the carrier bands is provided in Table 3. The shifting procedure was repeated for all four channel conditions (i.e., 4, 8, 12, and 16). The outputs of all bands were summed for acoustic presentations.

90 90 Table 3 Corner Frequencies of the Carrier Bands For Eight Insertion Conditions Carrier bands Insertion depth (mm from base)

91 Figure 19. Schematic representation of the electrode location. Schematic representation of the tonotopic location of the electrodes in a 4-band processor (i.e., carrier bands) and their frequency allocations (i.e., analysis bands). The upper panel (A) depicts an example of Experiment 1, where the analysis bands are fixed and the carrier bands are shifted basally by 7 mm. The lower panel (B) depicts an example of Experiment 2, where the analysis bands are widened by 5 mm on each frequency end and are compressively assigned to the narrower carriers. Experiment 2 measured tone recognition under the condition of frequency compression (see Figure 19B). Different from Experiment 1, the carrier bands in Experiment 2 were fixed in the frequency range of 492 to 5053 Hz to simulate an implant located between 25 mm and 10 mm from the base. The widened analysis bands (i.e., Hz, Hz, and Hz) were assigned compressively to the relatively narrower carrier bands (i.e., Hz). The analysis bands were widened evenly on both low and high frequency sides. The amount of widening was equivalent to 1, 3, and 5

92 92 mm on each side, thus 2, 6, and 10 mm in total (Greenwood 1990). Frequency compression was repeated for all four channel conditions. The frequency allocations of the analysis bands for the three compression conditions are provided in Table 4. Table 4 Compressed Frequency Allocation for a Four-Band Processor Compression size (mm) Cochlear location of analysis bands (mm) Frequency allocation for analysis bands (four channels) (matched) Subjects and procedure. Nine NH, Mandarin-Chinese native speakers (5 males and 4 females, ages 28.4 ± 6.7, mean ± SD) participated in the tone recognition tests. All subjects participated in Experiment 1 and five of them continued with Experiment 2. As will be reported in the Results section, the sample size of Experiment 2 appeared to be sufficient to detect statistical significance between the experimental conditions. All subjects were screened for pure tone thresholds lower than 20 db HL for octave

93 93 frequencies between 250 and 8000 Hz. The use of human subjects was approved by the Ohio University Institutional Review Board. Tone stimuli were presented at a comfortable level to the left ear of the listeners via a circumaural headphone (Sennheiser, HD 265) in an IAC double-walled sound booth. A custom graphical user interface (GUI) was developed in MATLAB to present the tone stimuli and to collect listeners responses. In order to avoid floor effects in real tests, all subjects received training and were required to reach an average performance of 70% percent correct with spectrally matched tone stimuli processed in a noise-excited vocoder. Each training session contained 1600 stimuli (10 syllables 4 tones 2 speakers 4 channel conditions 5 repetitions). Each training session started with stimuli processed in a 16-band processor and continued with stimuli processed in progressively fewer band processors. Eight subjects took 2 training sessions that lasted 3 hours long to reach the criteria. Only one subject needed an extra training session. Tone recognition was measured in a four-alternative forced-choice paradigm. Each stimulus was presented only once during in both training and testing. The subjects were instructed to take their best guess even if they were not sure about the response. Subjects responded by pressing one of the four GUI buttons, each labeled with one of the four possible answers. Feedback was provided after each response during training, while no feedback was provided during testing. The test for Experiment 1 consisted of 5120 stimuli (10 syllables 4 tones 2 speakers 8 insertion depths 4 channel conditions 2 repetitions) presented in random order, and required about 5.5 hours for each subject to complete. Experiment 2 contained 2560 randomized stimuli (10 syllables 4 tones 2

94 94 speakers 4 compression conditions 4 channel conditions 2 repetitions) and took about 2.5 hours for each subject to complete. The experiments were scheduled in blocks of 1 to 2 hours. The listeners were encouraged to take breaks within each test session. The training and testing spanned on average 2 to 3 weeks. Results Figure 20 plots the percent-correct scores as a function of simulated insertion depth for different numbers of channel conditions. As revealed by a two-way repeatedmeasure ANOVA, the effects of insertion depth [F (7, 56) = 22.30, p < ] and number of channels [F (3, 24) = 15.58, p = ] were both significant. Poorer performance was associated with fewer numbers of channels or shallower insertion depths. The greatest basal shift (i.e., 21 mm) caused the tone recognition performance to decrease by approximately 10 percentage points from the performance of unshifted condition (i.e., 28 mm). Vowel recognition data from Fu and Shannon (1999) are replotted in Figure 20 for comparison. An interaction between the two main factors (i.e., insertion depth and spectral resolution) was also significant [F (21, 168) = 2.73, p = ]. A post-hoc Cicchetti s test often used for unconfounded comparisons of interaction was performed. Comparisons of insertion depth conditions nested in the factor of number of channels showed that the effects of insertion depth were greater for stimuli with greater numbers of channels than with fewer ones. None of the 28 comparisons between insertion depth conditions for 4 channels were significant, indicating that there were no shallow-insertion effects at all for 4 channels. For 8 channels, only scores for insertion depths of 22 and 21

95 95 mm differed from those of the unshifted condition (p < 0.05). Similarly, for 12 and 16 channels, the effects of shallow insertion did not appear until 22 mm (p < 0.05). In addition, scores for insertion depth of 21 mm also differed from those for 27, 26, and 24 mm for 12 channels, and differed from those for mm for 16 channels (p < 0.05). Comparisons of the channel conditions nested in the factor of insertion depth revealed that the effects of number of channels diminished as the simulated insertion depth became shallower, and eventually disappeared when insertion depth became shallower than 24 mm (p > 0.05). Pattern of performance across different tones was similar for different channel conditions but different for syllable types, the types being open syllables and syllables with a nasal coda. Performance patterns of individual syllables were consistent within syllable types (i.e., open syllables or syllables with a nasal coda). Hence in Figure 21 we show the mean scores for the two types of syllables as a function of simulated insertion depth. For the open syllables (i.e., /fu/, /ji/, /ma/, /qi/, /xi/, and /yi/), the scores for the four tones were relatively consistent across insertion conditions (see Figure 21, upper panel). However, for the syllables with a nasal coda (i.e., /wan/, /xian/, /yan/, and /yang/), a seemingly increase in performance as a function of insertion depth was seen for tone 4. In contrast, the performance of the other three tones showed a declining pattern as the insertion depth became shallower (see Figure 21, lower panel). We will discuss in detail the idea that there was a trend in responses as tone 4 that led to the increased accuracy as well as false detection of tone 4.

96 Figure 20. Performance as a function of simulated insertion depth. Tone recognition performance is plotted for 4, 8, 12, and 16 channel conditions in solid lines with different symbols. The lowest corner frequency of frequency allocations for the carriers is noted for each simulated insertion depth. Data of vowel recognition from Fu and Shannon (1999) (abbreviated as F & S in the legend) are re-plotted with permission from the Acoustical Society of America. Simulated insertion depth of 28 mm corresponds to a full insertion or tonotopically matched condition. 96

97 97 Figure 21. Open versus nasal syllables. Performance of individual tones with spectral shift for two types of syllables. Tones 1-4 are plotted with circles, crosses, triangles, and squares, respectively. Simulated insertion depth of 28 mm corresponds to a full insertion or tonotopically matched condition. Experiment 2 measured tone recognition with compressed frequency allocation for 4 channel conditions (see Figure 22, upper panel). Wider analysis frequency ranges were assigned to relatively narrower carriers (see Table 4). As shown by a two-way repeated-measure ANOVA, the main effect of compression was found to be significant [F (3, 36) = 20.33, p = ], but the effects of number of channels were not, [F (3, 36) = 2.55, p = 0.11]. The interaction between the two factors was also significant [F (9, 36) = 3.82, p = 0.002]. Data collapsed across channel conditions are plotted in the lower panel of Figure 22. Post hoc analysis further showed that scores of 6 and 10 mm compression significantly improved as compared to the tonotopically matched condition

98 98 (p < 0.05). The best score was found for the 6 mm compression condition, but it was not significantly better than that of the 10 mm compression condition (p > 0.05). A tonotopically matched condition in this experiment simulated an implant located between 25 and 10 mm from the base (i.e., 492 to 5053 Hz). Note that it was different from the matched condition in Experiment 1 (e.g., 28 to 13 mm, or 269 to 3282 Hz). The recognition score of the matched condition derived from Experiment 1 is plotted in an open circle in the lower panel of Figure 22. A paired t test showed that performance for tonotopically matched condition at 25 mm was significantly worse compared to the matched condition at 28 mm obtained from Experiment 1 (t = 2.91, p = 0.04). Discussion and Conclusion Tone recognition was measured in two tonotopically matched conditions with simulated insertion depths of 28 and 25 mm in Experiments 1 and 2, respectively. In either condition, compared to English phoneme recognition measured using similar vocoder simulations, our results showed that tone recognition is poorer even though the chance level for tone recognition (25%) is much higher than phoneme recognition (5% for consonants and 8.33% for vowels). Our findings were consistent with previous vocoder studies (e.g., Xu et al., 2002, 2005) and observations from tone-languagespeaking CI users (e.g., Wei et al., 2000; Ciocca et al., 2002; Lee et al., 2002; Wong & Wong 2004). The poorer tone recognition was not surprising, because the spectral resolution provided by CIs does not allow the transmission of F0 or harmonics, while features of phonemes can be transmitted with temporal envelope information.

99 99 Figure 22. Performance as a function of frequency compression. Upper panel: tone recognition performance is plotted for 4, 8, 12, and 16 channel conditions in solid lines with different symbols. Lower panel: averaged tone recognition scores across channel conditions is plotted in filled circles with error bars (± S.D.). Score of tonotopically matched condition at 28 mm derived from Experiment 1 is plotted in an open circle with error bar (± S.D.) for comparison. Recognition performance with an insertion depth of 25 mm was lower than that of an insertion depth of 28 mm (see Figure 22, lower panel). This is probably due to more low frequency information being eliminated for the insertion depth of 25 mm. Tone recognition was much more resistant to the basal spectral shift compared to English phoneme and sentence recognition (Dorman et al., 1997; Fu & Shannon, 1999; Shannon et al., 1998). The deteriorating effects did not appear until the carriers were shifted to almost two octaves higher. Data from Fu and Shannon (1999) provided a clear contrast between the effects of spectral shift on vowels and tones (see Figure 20). At the

100 100 shallowest insertion depth (i.e., 21 mm), the averaged tone performance across channel conditions only decreased from the unshifted condition by approximately 10 percentage points, whereas a dramatic 60-percentage-point drop was demonstrated in vowel recognition (Fu & Shannon 1999). Although the 10-percentage-point decrease in performance was statistically significant, it may not have noticeable effects in communication using lexical tones. However, clinical data related to this are not available. For the 4-channel condition, shifting the spectrum did not affect tone recognition at all. In contrast, Dorman et al. (1997) reported that consonant and vowel recognition with a 5-channel tone vocoder decreased from the unshifted condition by 30 and 60 percentage points, respectively. Given that the subjects used in their study received extensive pretest training of 12 to 15 hours and the unlimited numbers of presentations that the subjects were allowed to listen to, the effects of spectral shift on tones were remarkably negligible. The effects of shift on tones for other channel conditions were also quite limited, being confined to insertion depths shallower than 23 mm (i.e., 22 mm and 21 mm). The general trend was that stimuli with better spectral resolution were affected by the shifts to a greater degree. This was linked to a diminishing channel effect with greater shift of the spectrum. Such an interaction between the spectral resolution and spectral shift was also evident in the vowel recognition scores from Fu and Shannon (1999) before the scores were normalized. We speculate that a stimulus generated with a greater number of channels contains more speech information, and therefore, the amount of information vulnerable to the frequency-place mismatch is also greater.

101 101 Disrupted frequency-place mapping does not have an equal effect on different aspects of speech perception. Conceivably, vowels are most prone to spectral shift (Dorman et al., 1997; Shannon et al., 1998; Fu & Shannon, 1999). The most heavily weighted cues for vowel recognition are their formant frequencies. Shifting the spectrum to higher frequencies easily destroys these spectral cues and results in acute decrease in vowel recognition. Consonant recognition, however, is not as vulnerable to spectral shift as vowel recognition. Information transmitted for place of articulation, which greatly relies on the place of spectrum peaks, was considerably affected as a result of spectral shifts (Dorman et al., 1997). Perception of manner of articulation or voicing distinctions defined by temporal features, such as noise rising time or duration of adjacent vowels (Pickett, 1999), might not be affected by spectral changes as much as that of place of articulation. Tone recognition presumably also relies on temporal information when spectral information is limited to a small number of frequency bands (Xu & Pfingst 2008). However, the reliance on temporal information for tone recognition is different than for phoneme recognition, which includes the periodicity contained in local channel envelopes or in the overall amplitude contours of the signal. As we have shown earlier, tone recognition was the least prone to spectral shift, but there was a difference in the performance for two types of syllables, the open syllables and the syllables with a nasal coda. The differences between the performances for the two types of syllables allowed us to investigate the contributions of the two temporal cues (i.e., envelope cues in local channels and overall amplitude contour cues) to tone recognition in frequency-place mismatched conditions.

102 102 Recognition of the four Mandarin tones for open syllables was quite consistent across insertion conditions. In contrast, for syllables with a nasal coda, a trend in responses for tone 4 was observed with increasing shift (see Figure 21, lower panel). This trend in responses for tone 4 caused the instances of hit for tone 4 as well as the false detection of other tones as tone 4 to increase. This response tendency could be explained by the differences between the two types of syllables in their overall amplitude contours. Figure 23 shows the overall amplitude contours for all syllables. The overall amplitude contours for each of the four tones are plotted on top of the 16-channel vocoded waveforms in the unshifted condition. Note that the difference in overall amplitude contours between the two types of syllables remains, despite the spectral shifts. The amplitude contours of the four tones of the open syllables resembled to various degrees the F0 contours of the tones. The amplitude contours of the syllable with a nasal coda, however, were greatly affected by the presence of the nasal coda. Syllables with a nasal coda tend to have low amplitude at the syllable ending, because nasal stops are greatly damped due to the broader band frequency response in the vocal tract (Fujimura, 1962). As a result, the ending of nasal syllables demonstrates a downward movement regardless of its original tone identity. The downward movement in the syllable ending might be perceived as a drop in F0, and elicit tone 4 responses. Similar perceptual results were found by Whalen and Xu (1992). Reponses by their listeners primarily depended on the movement of the amplitude segment when no F0 cue was provided. In conditions of moderate basal shift (i.e., >25 mm), when the local channel envelopes stimulated at moderately shifted places could still be perceived, the overall

103 103 amplitude contours did not seem to influence tone recognition, possibly because it is a less weighted cue. Evidence shows that amplitude contour cue has hardly any effect on tone recognition when it serves as a secondary cue (Lin, 1988; Xu & Pfingst, 2003). With increasing shift, however, the mismatch between the envelopes and the places being stimulated becomes increasingly large. In these cases, the periodicity information in the envelopes was coded at places with higher characteristic frequencies. It is possible that the periodicity information processed in mismatched auditory filters may not be perceived well. In fact, Oxenham and colleagues (2004) demonstrated that disassociation of temporal information from cochlear places affects temporal pitch coding. They showed that frequency discrimination of transposed tones, which are higher frequency carriers modulated with low frequency F0s, is much worse than frequency discrimination of pure tones. Their findings indicate that tonotopic representation is a necessary element for temporal pitch coding. In our cases, with increasing shift, the periodicity information was delivered to neurons that have mismatched characteristic frequencies and presumably could not be well represented. The derivation of the overall amplitude contour, however, only requires the summation of neuron firing rates across channels and therefore is independent from spectral manipulation.

104 Figure 23. Overall amplitude contours of all syllables. The overall amplitude contours were extracted by detecting and smoothing the amplitude peaks of the waveform. The overall amplitude contours of four tones for each syllable are plotted in black lines on top of the waveforms that are plotted in grey. 104

105 105 The amplitude contour cue became perceptually more dominant with increasing shift. For syllables with a nasal coda, a trend in responses for tone 4 was therefore introduced and the consequences were the enhanced recognition of tone 4 together with the increased instances of false detection for tone 4. It remains to be tested whether tonotopic mapping is critical to perceive temporal pitch, and whether the overall amplitude contour cue comes into play because of the distorted periodicity. Our results suggest that the transition in dominance of the two cues occurred at a simulated insertion depth of 25 mm (see Figure 21, lower panel). The amplitude contour of tone 3 was somewhat preserved even in the presence of the weak nasal energy (see Figure 21, lower panel). Hence recognition of tone 3 was less biased than tones 1 and 2. The increased scores for tone 4 at shallower depths offset the decreasing recognition accuracy of other tones so that the overall score did not change much with increasing shift. The amplitude contours of open syllable tones always provided relatively reliable cues for tone recognition, regardless of the degree that the envelopes in the local channels were distorted as a result of the spectral shift. Therefore, the performance of open syllable tones did not change with insertion conditions. Even though the overall amplitude contours are independent from the manipulation of carriers in channels, the above observations suggest that this cue is not as reliable for all Chinese syllables. It is reliable for open syllables, the vocalic portion of which contains only a monophthong vowel. The amplitude contours of tones are prone to changes for other types of syllables depending on the relative energies of the nucleus and the coda. The system of Chinese vowels consists of monophthong and diphthong.

106 106 Further, nasal or nasal cluster is the only possible syllable coda (Duanmu, 2002). In natural speech, the overall amplitude contours rarely work on their own to provide cues for tone recognition. Therefore, tones with nasal codas should not be more easily confused than tones carried by other types of syllables. Spectral shift as a result of shallow insertion potentially faced by many CI users created a unique condition where amplitude contours serve as the primary temporal cue to tone recognition. The results indicate that, although not reliable for certain syllable types, the overall amplitude contour cue contributes to the resistance of tone recognition to spectral modifications. Luo and Fu (2004) also showed that tone recognition can be enhanced by modifying the overall amplitude contours according to the F0 contours. Due to the limited length of the CI electrode array, the frequency range stimulated by a CI is also limited. Assigning a corresponding tonotopic frequency map to the electrodes would eliminate frequency coverage especially in the low frequency region, if the electrodes are inserted shallowly or short electrodes are used. Maps used clinically usually compressively allocate a wider frequency range sufficient for speech understanding to electrodes that cover a narrower cochlear location. Experiment 2 simulated such a scenario. The implant was simulated to have a length of 15 mm and an insertion depth of 25 mm from the base. Consistent with previous studies (e.g., Faulkner et al., 2003; Fu & Shannon, 1999) on English phoneme recognition, a tonotopically matched condition resulted in truncation of a fair amount of low frequency information, which, in turn, resulted in decreased tone recognition performance compared with the performance of the insertion depth of 28 mm

107 107 obtained from Experiment 1 (see Figure 22, lower panel). Compression of a wider frequency range with 3 mm or 5 mm at both frequency ends produced better performance than that without compression (see Figure 22, lower panel). A small amount of compression (i.e., 1 mm) did not provide an acoustic range wide enough, particularly on the low frequency end, to compensate for the frequency-place distortion as a result of the compression. However, neither did optimal performance occur with the largest amount of compression. The best performance was found with a moderate compression that enhanced low frequency information for tone recognition. This was consistent with the findings of Baskent and Shannon (2003, 2005) that in shallow insertion conditions, a moderate amount of compression was better than tonotopic mapping with low frequency truncation. These results suggest that wider frequency allocation that includes low frequency information critical for pitch perception may benefit tone recognition. However, the degree of compression should also be controlled relative to insertion depth to provide maximum benefit for implant patients. In conclusion, tone recognition was fairly resistant to spectral mismatch because it uses the overall amplitude contours independent from spectral alternation. The amplitude contour cue was not reliable for all Chinese syllables, especially for syllables with a nasal coda. Tone recognition with moderate frequency compression was improved as a result of the extended low frequency end provided with the compression. Frequency mapping for modern CI devices often involves two forms of frequency-place mismatch examined in the present study. Although technically difficult, it would be interesting to confirm the findings of the present study in CI users.

108 108 Frequency-Place Mismatch and Consonant Confusion Methods From the database recorded by Shannon et al. (1999), a set of digitized naturally produced consonants from one female (#3) and one male (#3) speaker was drawn. The set contained 20 consonants (ʧ, ʤ, t, d, k, g, p, b, n, m, s, z, ʃ, f, v, ð, l, r, j, w) produced in a /Ca/ context, resulting in 40 speech tokens in total (20 consonants 2 speakers). The speech tokens were subjected to noise-excited vocoder processing. Signal processing was performed in MATLAB (MathWorks, Natick, MA). The speech signals were preemphasized by highpass filtering at 1200 Hz (1 st -order Butterworth filter, 6 db/octave) and divided into 4, 8, 12, or 16 frequency bands. The frequency range of the analysis bands was Hz, covering a tonotopic location between 28 mm and 16 mm from the basal end in a 35 mm long cochlea (Greenwood, 1990). The Greenwood (1990) formula, F = 165.4( x - 1) where x is the distance in mm from the apex, was used to determine the bandwidth and corner frequencies of the analysis bands. The output of each analysis band was half-wave rectified and then low-pass filtered at 160 Hz (2 nd -order Butterworth, 12 db/octave) to extract the temporal envelope. Each temporal envelope was used to amplitude modulate a white noise. The modulated signal was then bandpass filtered into either the same band where the envelope was extracted to simulate a tonotopically matched condition, or bandpass filtered into higher frequency bands to simulate frequency-place shifted conditions. The cut-off frequencies of carrier bands were also estimated by the Greenwood (1990) formula. The frequency allocation of the carrier bands was systematically manipulated to simulate a basal shift from the analysis

109 109 bands over a tonotopic distance of 6 mm with a step size of 1 mm. Thus, seven frequency-place matching conditions (i.e., one unshifted and six shifted) were created. The manipulation was repeated for all four channel conditions (i.e., 4, 8, 12, and 16). The frequency allocation for the carrier bands of 16 channels is provided in Table 5. Finally, the outputs of all bands were summed and stored on computer for acoustic presentations. Ten native English-speaking subjects recruited from the Ohio University student population participated in the study. The subjects were screened for normal hearing ( 20 db HL) at octave frequencies between 250 and 8000 Hz. The use of human subjects was reviewed and approved by the Ohio University Institutional Review Board. The consonant recognition test was conducted in an IAC sound booth. A graphic user interface was developed in MATLAB to present stimuli and collect responses from the subjects. The consonant stimuli were presented to the left ear of the listeners via a circumaural headphone (Sennheiser, HD 265) at a comfortable level.

110 Table 5 Corner Frequencies of the Carrier Bands for Seven Mismatch Conditions Carrier bands Shift (mm)

111 The subjects adjusted the soundcard output levels to their respective most comfortable levels before each session of training or testing. The presentation level was approximately 65 ± 5 db SPL. The task of the subjects was to identify the consonant they had heard by clicking one of the 20 buttons labeled with the CV strings (e.g., Ba, Da, Ga, ). The subjects were trained with the tonotopically-matched stimuli processed with 4, 8, 12, and 16 channels. Each training session lasted about 30 minutes, which always started with presumably the easiest stimuli (i.e., the 16-channel condition) followed by progressively more difficult stimuli. During training, within each channel condition, the presentation of stimuli was randomized. After the subject gave a response, the button of the correct consonant flashed to provide feedback. Given the reduced bandwidth of the stimuli, and based on experiences from our previous studies and literature, training was considered adequate when recognition of the tonotopically matched stimuli reached 60% correct. On average, each subject took approximately 10 hours in training before continuing with the test. In the test, the presentation of 5600 stimuli (i.e., 20 consonants 2 speakers 4 channel conditions 7 frequency-place shift conditions 5 repetitions) was completely randomized. The test was divided into a number of sessions and took each subject approximately 5-6 hours to complete. The 20 consonants were coded according to their articulatory features, which included voicing (i.e., voiced and voiceless), place (i.e., labial, alveolar, palatal, and velar) and manner of articulation (i.e., stop, fricative, affricate, nasal, and glide) (Ladefoged, 1975). Since /ð / was the only dental consonant, it was grouped as an alveolar. The consonants were coded as shown in Table 6.

112 112 Table 6 Coding of Consonants Consonants t d k g p b n M s z f V ð l r j w Voicing Manner Place Manner of articulation is coded from 1-5 as: stop, fricative, affricate, nasal, and glide. Place of articulation is coded from 1-4 as: labial, alveolar, palatal and velar. Voicing is coded 1 vs. 0 for voiced vs. voiceless consonants. Results Effects of spectral shift and spectral resolution. Figure 24A summarizes the percent correct scores of various channel and spectral shift conditions. A two-way repeatedmeasure ANOVA indicated significant effects of both number of channels [F (3, 27) = 22.84, p < ] and spectral shift [F (6, 54) = , p < ]. Post hoc analysis of the main factors revealed that spectral shift of 3 mm caused the performance to decrease significantly from the tonotopically matched condition (i.e., 28 mm) (p < 0.05). Performance with 4 channels was significantly lower than all other channel conditions (p < 0.05), while the performance of 8, 12, and 16 channels did not show significant differences from each other (p > 0.05). ANOVA showed that the interaction between the two factors was also statistically significant [F (18, 162) = 13.39, p < ]. Post-hoc analysis of the interaction revealed that when the spectral shift was > 3 mm, the spectral

113 113 resolution of the signals no longer seemed to play a role and therefore the performance of all channel conditions did not differ (p > 0.05). Information transmission scores (%) are shown in Figure 24B-D for the features of voicing, manner, and place of articulation in all experimental conditions (Miller & Nicely, 1955). Three sets of two-way repeated-measure ANOVA were conducted to examine the effects of number of channels as well as the effects of place-frequency shift for all three features. For the feature of voicing, the effects of number of channels [F (3, 27) = 5.38, p = 0.005] and spectral shift [F (6, 54) = 4.64, p = ] were both found to be statistically significant. For the feature of manner of articulation, only the effects of spectral shift [F (6, 54) = 4.25, p = 0.001], but not number of channels [F (3, 27) = 1.31, p = 0.3] were found to be significant. For the feature of place of articulation, again, both main factors were found to be significant [F (3, 27) = 17.83, p < ; F (6, 54) = 9.08, p < ]. For all three features, the interactions between the two main effects were not statistically significant [voicing: F (18,162) = 1.03, p = 0.43; manner: F (18,162) = 1.14, p = 0.32; place: F (18,162) = 1.22, p = 0.25].

114 114 Figure 24. Percent correct and information transmission scores for articulatory features (N =10). A: Averaged percent correct scores are plotted as a function of frequency-place shift. Scores of 4 different channel conditions are plotted in solid lines with different symbols. B-D: Averaged information transmission scores are plotted as a function of frequency-place shift for the features of voicing, manner of articulation, and place of articulation. Scores of 4 different channel conditions are plotted in solid lines with different symbols. Error bars represent SDs. Figure 25 shows the information transmitted as a function of frequency-place shift for each specific manner of articulation (i.e., stop, fricative, affricate, nasal, and glide) and each specific place of articulation (i.e., labial, alveolar, palatal, and velar). Each

115 115 specific manner and place of articulation category (e.g., stop) was treated as a binary feature (e.g., stop vs. non-stop) in the derivation of their information transmission scores. Voicing is already a binary feature that has been described in Figure 24. The effect of spectral resolution is not shown in Figure 25, because it was not a significant factor for transmitting the feature of manner, and there was no interaction between channels and spectral shift for the feature of place of articulation. Figure 25 indicates that the nonmonotonic function of place of articulation overall (see Fig 24D) is accounted for by the labial feature. Figure 25. Information transmission for particular manner and place of articulation (N =10). Left panel: Information transmitted for sub-categories of manner of articulation is plotted as a function of frequency-place shift in different symbols. Right panel: Information transmitted for sub-categories of place of articulation is plotted as a function of frequency-place shift in different symbols.

116 116 Confusion analysis Confusion matrices of data pooled across all channel conditions are shown in Figure 26 with different panels for different spectral shift conditions. Note that the quantitative analysis of the error patterns described below was conducted for each spectral shift and number of channel condition in order to elucidate the effects of both factors on confusions. In fact, such analysis revealed that the confusion error patterns at different spectral resolutions were similar (see below). In Figure 26, the confusion matrices were organized based on features of manner and place of articulation. First, the consonants that are of the same manner of articulations were grouped together. Boundaries between manners of articulation are indicated by the white squares. Within each group of manner of articulation, the consonants were then sorted following the order of alveolar, palatal, velar, and labial, if applicable. This place order reflects the frequency region of the acoustic correlates that are most relevant to the perception of place of articulation (also see Johnson, 2003; Pickett, 1999; Stevens, 1998).

117 117 Figure 26. Confusion matrices using data pooled across channel conditions (N =10). The white squares indicate boundaries of manner of articulation. Within each manner of articulation, the consonants were sorted, whenever appropriate, based on place of articulation following the order of alveolar, palatal, velar and bilabial. Specifically, one of the primary acoustic correlates of place distinction of stops is the spectral dominance of the short-term spectrum at consonantal release (Fant, 1973;

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants Context Effect on Segmental and Supresegmental Cues Preceding context has been found to affect phoneme recognition Stop consonant recognition (Mann, 1980) A continuum from /da/ to /ga/ was preceded by

More information

Relationship Between Tone Perception and Production in Prelingually Deafened Children With Cochlear Implants

Relationship Between Tone Perception and Production in Prelingually Deafened Children With Cochlear Implants Otology & Neurotology 34:499Y506 Ó 2013, Otology & Neurotology, Inc. Relationship Between Tone Perception and Production in Prelingually Deafened Children With Cochlear Implants * Ning Zhou, Juan Huang,

More information

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear The psychophysics of cochlear implants Stuart Rosen Professor of Speech and Hearing Science Speech, Hearing and Phonetic Sciences Division of Psychology & Language Sciences Prelude Envelope and temporal

More information

Hearing the Universal Language: Music and Cochlear Implants

Hearing the Universal Language: Music and Cochlear Implants Hearing the Universal Language: Music and Cochlear Implants Professor Hugh McDermott Deputy Director (Research) The Bionics Institute of Australia, Professorial Fellow The University of Melbourne Overview?

More information

Tone Production in Mandarin-speaking Children with Cochlear Implants: A Preliminary Study

Tone Production in Mandarin-speaking Children with Cochlear Implants: A Preliminary Study Acta Otolaryngol 2004; 124: 363/367 Tone Production in Mandarin-speaking Children with Cochlear Implants: A Preliminary Study LI XU 1, YONGXIN LI 2, JIANPING HAO 1, XIUWU CHEN 2, STEVE A. XUE 1 and DEMIN

More information

Role of F0 differences in source segregation

Role of F0 differences in source segregation Role of F0 differences in source segregation Andrew J. Oxenham Research Laboratory of Electronics, MIT and Harvard-MIT Speech and Hearing Bioscience and Technology Program Rationale Many aspects of segregation

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception Long-term spectrum of speech HCS 7367 Speech Perception Connected speech Absolute threshold Males Dr. Peter Assmann Fall 212 Females Long-term spectrum of speech Vowels Males Females 2) Absolute threshold

More information

Who are cochlear implants for?

Who are cochlear implants for? Who are cochlear implants for? People with little or no hearing and little conductive component to the loss who receive little or no benefit from a hearing aid. Implants seem to work best in adults who

More information

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair Who are cochlear implants for? Essential feature People with little or no hearing and little conductive component to the loss who receive little or no benefit from a hearing aid. Implants seem to work

More information

Simulations of high-frequency vocoder on Mandarin speech recognition for acoustic hearing preserved cochlear implant

Simulations of high-frequency vocoder on Mandarin speech recognition for acoustic hearing preserved cochlear implant INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Simulations of high-frequency vocoder on Mandarin speech recognition for acoustic hearing preserved cochlear implant Tsung-Chen Wu 1, Tai-Shih Chi

More information

PLEASE SCROLL DOWN FOR ARTICLE

PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by:[michigan State University Libraries] On: 9 October 2007 Access Details: [subscription number 768501380] Publisher: Informa Healthcare Informa Ltd Registered in England and

More information

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair Who are cochlear implants for? Essential feature People with little or no hearing and little conductive component to the loss who receive little or no benefit from a hearing aid. Implants seem to work

More information

Hearing Research 242 (2008) Contents lists available at ScienceDirect. Hearing Research. journal homepage:

Hearing Research 242 (2008) Contents lists available at ScienceDirect. Hearing Research. journal homepage: Hearing Research (00) 3 0 Contents lists available at ScienceDirect Hearing Research journal homepage: www.elsevier.com/locate/heares Spectral and temporal cues for speech recognition: Implications for

More information

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for What you re in for Speech processing schemes for cochlear implants Stuart Rosen Professor of Speech and Hearing Science Speech, Hearing and Phonetic Sciences Division of Psychology & Language Sciences

More information

The development of a modified spectral ripple test

The development of a modified spectral ripple test The development of a modified spectral ripple test Justin M. Aronoff a) and David M. Landsberger Communication and Neuroscience Division, House Research Institute, 2100 West 3rd Street, Los Angeles, California

More information

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening AUDL GS08/GAV1 Signals, systems, acoustics and the ear Pitch & Binaural listening Review 25 20 15 10 5 0-5 100 1000 10000 25 20 15 10 5 0-5 100 1000 10000 Part I: Auditory frequency selectivity Tuning

More information

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED Francisco J. Fraga, Alan M. Marotta National Institute of Telecommunications, Santa Rita do Sapucaí - MG, Brazil Abstract A considerable

More information

Research Article The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Sentence Recognition

Research Article The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Sentence Recognition Hindawi Neural Plasticity Volume 2017, Article ID 7416727, 7 pages https://doi.org/10.1155/2017/7416727 Research Article The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for

More information

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds Phonak Insight April 2016 SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds Phonak led the way in modern frequency lowering technology with the introduction

More information

Speech conveys not only linguistic content but. Vocal Emotion Recognition by Normal-Hearing Listeners and Cochlear Implant Users

Speech conveys not only linguistic content but. Vocal Emotion Recognition by Normal-Hearing Listeners and Cochlear Implant Users Cochlear Implants Special Issue Article Vocal Emotion Recognition by Normal-Hearing Listeners and Cochlear Implant Users Trends in Amplification Volume 11 Number 4 December 2007 301-315 2007 Sage Publications

More information

Mandarin tone recognition in cochlear-implant subjects q

Mandarin tone recognition in cochlear-implant subjects q Hearing Research 197 (2004) 87 95 www.elsevier.com/locate/heares Mandarin tone recognition in cochlear-implant subjects q Chao-Gang Wei a, Keli Cao a, Fan-Gang Zeng a,b,c,d, * a Department of Otolaryngology,

More information

Acoustic cues to tonal contrasts in Mandarin: Implications for cochlear implants

Acoustic cues to tonal contrasts in Mandarin: Implications for cochlear implants Acoustic cues to tonal contrasts in Mandarin: Implications for cochlear implants Yu-Ching Kuo a Department of Special Education, Taipei Municipal University of Education, No. 1, Ai-Guo West Road, Taipei,

More information

BORDERLINE PATIENTS AND THE BRIDGE BETWEEN HEARING AIDS AND COCHLEAR IMPLANTS

BORDERLINE PATIENTS AND THE BRIDGE BETWEEN HEARING AIDS AND COCHLEAR IMPLANTS BORDERLINE PATIENTS AND THE BRIDGE BETWEEN HEARING AIDS AND COCHLEAR IMPLANTS Richard C Dowell Graeme Clark Chair in Audiology and Speech Science The University of Melbourne, Australia Hearing Aid Developers

More information

An Auditory-Model-Based Electrical Stimulation Strategy Incorporating Tonal Information for Cochlear Implant

An Auditory-Model-Based Electrical Stimulation Strategy Incorporating Tonal Information for Cochlear Implant Annual Progress Report An Auditory-Model-Based Electrical Stimulation Strategy Incorporating Tonal Information for Cochlear Implant Joint Research Centre for Biomedical Engineering Mar.7, 26 Types of Hearing

More information

Speech Perception in Mandarin- and Cantonese- Speaking Children with Cochlear Implants: A Systematic Review

Speech Perception in Mandarin- and Cantonese- Speaking Children with Cochlear Implants: A Systematic Review City University of New York (CUNY) CUNY Academic Works Dissertations, Theses, and Capstone Projects Graduate Center 5-2018 Speech Perception in Mandarin- and Cantonese- Speaking Children with Cochlear

More information

Twenty subjects (11 females) participated in this study. None of the subjects had

Twenty subjects (11 females) participated in this study. None of the subjects had SUPPLEMENTARY METHODS Subjects Twenty subjects (11 females) participated in this study. None of the subjects had previous exposure to a tone language. Subjects were divided into two groups based on musical

More information

Implementation of Spectral Maxima Sound processing for cochlear. implants by using Bark scale Frequency band partition

Implementation of Spectral Maxima Sound processing for cochlear. implants by using Bark scale Frequency band partition Implementation of Spectral Maxima Sound processing for cochlear implants by using Bark scale Frequency band partition Han xianhua 1 Nie Kaibao 1 1 Department of Information Science and Engineering, Shandong

More information

Computational Perception /785. Auditory Scene Analysis

Computational Perception /785. Auditory Scene Analysis Computational Perception 15-485/785 Auditory Scene Analysis A framework for auditory scene analysis Auditory scene analysis involves low and high level cues Low level acoustic cues are often result in

More information

Binaural Hearing. Why two ears? Definitions

Binaural Hearing. Why two ears? Definitions Binaural Hearing Why two ears? Locating sounds in space: acuity is poorer than in vision by up to two orders of magnitude, but extends in all directions. Role in alerting and orienting? Separating sound

More information

Modern cochlear implants provide two strategies for coding speech

Modern cochlear implants provide two strategies for coding speech A Comparison of the Speech Understanding Provided by Acoustic Models of Fixed-Channel and Channel-Picking Signal Processors for Cochlear Implants Michael F. Dorman Arizona State University Tempe and University

More information

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

The role of periodicity in the perception of masked speech with simulated and real cochlear implants The role of periodicity in the perception of masked speech with simulated and real cochlear implants Kurt Steinmetzger and Stuart Rosen UCL Speech, Hearing and Phonetic Sciences Heidelberg, 09. November

More information

EEL 6586, Project - Hearing Aids algorithms

EEL 6586, Project - Hearing Aids algorithms EEL 6586, Project - Hearing Aids algorithms 1 Yan Yang, Jiang Lu, and Ming Xue I. PROBLEM STATEMENT We studied hearing loss algorithms in this project. As the conductive hearing loss is due to sound conducting

More information

752 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 51, NO. 5, MAY N. Lan*, K. B. Nie, S. K. Gao, and F. G. Zeng

752 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 51, NO. 5, MAY N. Lan*, K. B. Nie, S. K. Gao, and F. G. Zeng 752 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 51, NO. 5, MAY 2004 A Novel Speech-Processing Strategy Incorporating Tonal Information for Cochlear Implants N. Lan*, K. B. Nie, S. K. Gao, and F.

More information

Psychophysically based site selection coupled with dichotic stimulation improves speech recognition in noise with bilateral cochlear implants

Psychophysically based site selection coupled with dichotic stimulation improves speech recognition in noise with bilateral cochlear implants Psychophysically based site selection coupled with dichotic stimulation improves speech recognition in noise with bilateral cochlear implants Ning Zhou a) and Bryan E. Pfingst Kresge Hearing Research Institute,

More information

Lexical tone recognition in noise in normal-hearing children and prelingually deafened children with cochlear implants

Lexical tone recognition in noise in normal-hearing children and prelingually deafened children with cochlear implants International Journal of Audiology 2017; 56: S23 S30 Original Article Lexical tone recognition in noise in normal-hearing children and prelingually deafened children with cochlear implants Yitao Mao 1,2

More information

Static and Dynamic Spectral Acuity in Cochlear Implant Listeners for Simple and Speech-like Stimuli

Static and Dynamic Spectral Acuity in Cochlear Implant Listeners for Simple and Speech-like Stimuli University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 6-30-2016 Static and Dynamic Spectral Acuity in Cochlear Implant Listeners for Simple and Speech-like Stimuli

More information

ABSTRACT. Møller A (ed): Cochlear and Brainstem Implants. Adv Otorhinolaryngol. Basel, Karger, 2006, vol 64, pp

ABSTRACT. Møller A (ed): Cochlear and Brainstem Implants. Adv Otorhinolaryngol. Basel, Karger, 2006, vol 64, pp Speech processing in vocoder-centric cochlear implants Philipos C. Department of Electrical Engineering Jonsson School of Engineering and Computer Science University of Texas at Dallas P.O.Box 830688 Richardson,

More information

Spectrograms (revisited)

Spectrograms (revisited) Spectrograms (revisited) We begin the lecture by reviewing the units of spectrograms, which I had only glossed over when I covered spectrograms at the end of lecture 19. We then relate the blocks of a

More information

Tone perception of Cantonese-speaking prelingually hearingimpaired children with cochlear implants

Tone perception of Cantonese-speaking prelingually hearingimpaired children with cochlear implants Title Tone perception of Cantonese-speaking prelingually hearingimpaired children with cochlear implants Author(s) Wong, AOC; Wong, LLN Citation Otolaryngology - Head And Neck Surgery, 2004, v. 130 n.

More information

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES Varinthira Duangudom and David V Anderson School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332

More information

Hearing Research 242 (2008) Contents lists available at ScienceDirect. Hearing Research. journal homepage:

Hearing Research 242 (2008) Contents lists available at ScienceDirect. Hearing Research. journal homepage: Hearing Research 242 (2008) 164 171 Contents lists available at ScienceDirect Hearing Research journal homepage: www.elsevier.com/locate/heares Combined acoustic and electric hearing: Preserving residual

More information

EXECUTIVE SUMMARY Academic in Confidence data removed

EXECUTIVE SUMMARY Academic in Confidence data removed EXECUTIVE SUMMARY Academic in Confidence data removed Cochlear Europe Limited supports this appraisal into the provision of cochlear implants (CIs) in England and Wales. Inequity of access to CIs is a

More information

Perception of Spectrally Shifted Speech: Implications for Cochlear Implants

Perception of Spectrally Shifted Speech: Implications for Cochlear Implants Int. Adv. Otol. 2011; 7:(3) 379-384 ORIGINAL STUDY Perception of Spectrally Shifted Speech: Implications for Cochlear Implants Pitchai Muthu Arivudai Nambi, Subramaniam Manoharan, Jayashree Sunil Bhat,

More information

ADVANCES in NATURAL and APPLIED SCIENCES

ADVANCES in NATURAL and APPLIED SCIENCES ADVANCES in NATURAL and APPLIED SCIENCES ISSN: 1995-0772 Published BYAENSI Publication EISSN: 1998-1090 http://www.aensiweb.com/anas 2016 December10(17):pages 275-280 Open Access Journal Improvements in

More information

Perception and Production of Mandarin Tones in Prelingually Deaf Children with Cochlear Implants

Perception and Production of Mandarin Tones in Prelingually Deaf Children with Cochlear Implants Perception and Production of Mandarin Tones in Prelingually Deaf Children with Cochlear Implants Shu-Chen Peng, J. Bruce Tomblin, Hintat Cheung, Yung-Song Lin, and Lih-Sheue Wang Objective: Mandarin is

More information

Study on Effect of Voice Analysis Applying on Tone Training Software for Hearing Impaired People

Study on Effect of Voice Analysis Applying on Tone Training Software for Hearing Impaired People International Journal of Education and Information Technology Vol. 3, No. 3, 2018, pp. 53-59 http://www.aiscience.org/journal/ijeit ISSN: 2381-7410 (Print); ISSN: 2381-7429 (Online) Study on Effect of

More information

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES ISCA Archive ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES Allard Jongman 1, Yue Wang 2, and Joan Sereno 1 1 Linguistics Department, University of Kansas, Lawrence, KS 66045 U.S.A. 2 Department

More information

whether or not the fundamental is actually present.

whether or not the fundamental is actually present. 1) Which of the following uses a computer CPU to combine various pure tones to generate interesting sounds or music? 1) _ A) MIDI standard. B) colored-noise generator, C) white-noise generator, D) digital

More information

Music Perception in Cochlear Implant Users

Music Perception in Cochlear Implant Users Music Perception in Cochlear Implant Users Patrick J. Donnelly (2) and Charles J. Limb (1,2) (1) Department of Otolaryngology-Head and Neck Surgery (2) Peabody Conservatory of Music The Johns Hopkins University

More information

Exploring the parameter space of Cochlear Implant Processors for consonant and vowel recognition rates using normal hearing listeners

Exploring the parameter space of Cochlear Implant Processors for consonant and vowel recognition rates using normal hearing listeners PAGE 335 Exploring the parameter space of Cochlear Implant Processors for consonant and vowel recognition rates using normal hearing listeners D. Sen, W. Li, D. Chung & P. Lam School of Electrical Engineering

More information

The right information may matter more than frequency-place alignment: simulations of

The right information may matter more than frequency-place alignment: simulations of The right information may matter more than frequency-place alignment: simulations of frequency-aligned and upward shifting cochlear implant processors for a shallow electrode array insertion Andrew FAULKNER,

More information

REVISED. The effect of reduced dynamic range on speech understanding: Implications for patients with cochlear implants

REVISED. The effect of reduced dynamic range on speech understanding: Implications for patients with cochlear implants REVISED The effect of reduced dynamic range on speech understanding: Implications for patients with cochlear implants Philipos C. Loizou Department of Electrical Engineering University of Texas at Dallas

More information

A neural network model for optimizing vowel recognition by cochlear implant listeners

A neural network model for optimizing vowel recognition by cochlear implant listeners A neural network model for optimizing vowel recognition by cochlear implant listeners Chung-Hwa Chang, Gary T. Anderson, Member IEEE, and Philipos C. Loizou, Member IEEE Abstract-- Due to the variability

More information

Critical Review: Speech Perception and Production in Children with Cochlear Implants in Oral and Total Communication Approaches

Critical Review: Speech Perception and Production in Children with Cochlear Implants in Oral and Total Communication Approaches Critical Review: Speech Perception and Production in Children with Cochlear Implants in Oral and Total Communication Approaches Leah Chalmers M.Cl.Sc (SLP) Candidate University of Western Ontario: School

More information

Differential-Rate Sound Processing for Cochlear Implants

Differential-Rate Sound Processing for Cochlear Implants PAGE Differential-Rate Sound Processing for Cochlear Implants David B Grayden,, Sylvia Tari,, Rodney D Hollow National ICT Australia, c/- Electrical & Electronic Engineering, The University of Melbourne

More information

Implant Subjects. Jill M. Desmond. Department of Electrical and Computer Engineering Duke University. Approved: Leslie M. Collins, Supervisor

Implant Subjects. Jill M. Desmond. Department of Electrical and Computer Engineering Duke University. Approved: Leslie M. Collins, Supervisor Using Forward Masking Patterns to Predict Imperceptible Information in Speech for Cochlear Implant Subjects by Jill M. Desmond Department of Electrical and Computer Engineering Duke University Date: Approved:

More information

functions grow at a higher rate than in normal{hearing subjects. In this chapter, the correlation

functions grow at a higher rate than in normal{hearing subjects. In this chapter, the correlation Chapter Categorical loudness scaling in hearing{impaired listeners Abstract Most sensorineural hearing{impaired subjects show the recruitment phenomenon, i.e., loudness functions grow at a higher rate

More information

Long-Term Performance for Children with Cochlear Implants

Long-Term Performance for Children with Cochlear Implants Long-Term Performance for Children with Cochlear Implants The University of Iowa Elizabeth Walker, M.A., Camille Dunn, Ph.D., Bruce Gantz, M.D., Virginia Driscoll, M.A., Christine Etler, M.A., Maura Kenworthy,

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception Babies 'cry in mother's tongue' HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Babies' cries imitate their mother tongue as early as three days old German researchers say babies begin to pick up

More information

Peter S Roland M.D. UTSouthwestern Medical Center Dallas, Texas Developments

Peter S Roland M.D. UTSouthwestern Medical Center Dallas, Texas Developments Peter S Roland M.D. UTSouthwestern Medical Center Dallas, Texas Developments New electrodes New speech processing strategies Bilateral implants Hybrid implants ABI in Kids MRI vs CT Meningitis Totally

More information

Sound localization psychophysics

Sound localization psychophysics Sound localization psychophysics Eric Young A good reference: B.C.J. Moore An Introduction to the Psychology of Hearing Chapter 7, Space Perception. Elsevier, Amsterdam, pp. 233-267 (2004). Sound localization:

More information

Ting Zhang, 1 Michael F. Dorman, 2 and Anthony J. Spahr 2

Ting Zhang, 1 Michael F. Dorman, 2 and Anthony J. Spahr 2 Information From the Voice Fundamental Frequency (F0) Region Accounts for the Majority of the Benefit When Acoustic Stimulation Is Added to Electric Stimulation Ting Zhang, 1 Michael F. Dorman, 2 and Anthony

More information

Optimal Filter Perception of Speech Sounds: Implications to Hearing Aid Fitting through Verbotonal Rehabilitation

Optimal Filter Perception of Speech Sounds: Implications to Hearing Aid Fitting through Verbotonal Rehabilitation Optimal Filter Perception of Speech Sounds: Implications to Hearing Aid Fitting through Verbotonal Rehabilitation Kazunari J. Koike, Ph.D., CCC-A Professor & Director of Audiology Department of Otolaryngology

More information

Effect of spectral normalization on different talker speech recognition by cochlear implant users

Effect of spectral normalization on different talker speech recognition by cochlear implant users Effect of spectral normalization on different talker speech recognition by cochlear implant users Chuping Liu a Department of Electrical Engineering, University of Southern California, Los Angeles, California

More information

Encoding pitch contours using current steering

Encoding pitch contours using current steering Encoding pitch contours using current steering Xin Luo a Department of Speech, Language, and Hearing Sciences, Purdue University, 500 Oval Drive, West Lafayette, Indiana 47907 David M. Landsberger, Monica

More information

Best Practice Protocols

Best Practice Protocols Best Practice Protocols SoundRecover for children What is SoundRecover? SoundRecover (non-linear frequency compression) seeks to give greater audibility of high-frequency everyday sounds by compressing

More information

UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication

UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication Citation for published version (APA): Brons, I. (2013). Perceptual evaluation

More information

Speech (Sound) Processing

Speech (Sound) Processing 7 Speech (Sound) Processing Acoustic Human communication is achieved when thought is transformed through language into speech. The sounds of speech are initiated by activity in the central nervous system,

More information

The functional importance of age-related differences in temporal processing

The functional importance of age-related differences in temporal processing Kathy Pichora-Fuller The functional importance of age-related differences in temporal processing Professor, Psychology, University of Toronto Adjunct Scientist, Toronto Rehabilitation Institute, University

More information

RESEARCH ON SPOKEN LANGUAGE PROCESSING Progress Report No. 22 (1998) Indiana University

RESEARCH ON SPOKEN LANGUAGE PROCESSING Progress Report No. 22 (1998) Indiana University SPEECH PERCEPTION IN CHILDREN RESEARCH ON SPOKEN LANGUAGE PROCESSING Progress Report No. 22 (1998) Indiana University Speech Perception in Children with the Clarion (CIS), Nucleus-22 (SPEAK) Cochlear Implant

More information

Effects of Presentation Level on Phoneme and Sentence Recognition in Quiet by Cochlear Implant Listeners

Effects of Presentation Level on Phoneme and Sentence Recognition in Quiet by Cochlear Implant Listeners Effects of Presentation Level on Phoneme and Sentence Recognition in Quiet by Cochlear Implant Listeners Gail S. Donaldson, and Shanna L. Allen Objective: The objectives of this study were to characterize

More information

Speech, Language, and Hearing Sciences. Discovery with delivery as WE BUILD OUR FUTURE

Speech, Language, and Hearing Sciences. Discovery with delivery as WE BUILD OUR FUTURE Speech, Language, and Hearing Sciences Discovery with delivery as WE BUILD OUR FUTURE It began with Dr. Mack Steer.. SLHS celebrates 75 years at Purdue since its beginning in the basement of University

More information

Representation of sound in the auditory nerve

Representation of sound in the auditory nerve Representation of sound in the auditory nerve Eric D. Young Department of Biomedical Engineering Johns Hopkins University Young, ED. Neural representation of spectral and temporal information in speech.

More information

Effect of mismatched place-of-stimulation on the salience of binaural cues in conditions that simulate bilateral cochlear-implant listening

Effect of mismatched place-of-stimulation on the salience of binaural cues in conditions that simulate bilateral cochlear-implant listening Effect of mismatched place-of-stimulation on the salience of binaural cues in conditions that simulate bilateral cochlear-implant listening Matthew J. Goupell, a) Corey Stoelb, Alan Kan, and Ruth Y. Litovsky

More information

Results. Dr.Manal El-Banna: Phoniatrics Prof.Dr.Osama Sobhi: Audiology. Alexandria University, Faculty of Medicine, ENT Department

Results. Dr.Manal El-Banna: Phoniatrics Prof.Dr.Osama Sobhi: Audiology. Alexandria University, Faculty of Medicine, ENT Department MdEL Med-EL- Cochlear Implanted Patients: Early Communicative Results Dr.Manal El-Banna: Phoniatrics Prof.Dr.Osama Sobhi: Audiology Alexandria University, Faculty of Medicine, ENT Department Introduction

More information

Issues faced by people with a Sensorineural Hearing Loss

Issues faced by people with a Sensorineural Hearing Loss Issues faced by people with a Sensorineural Hearing Loss Issues faced by people with a Sensorineural Hearing Loss 1. Decreased Audibility 2. Decreased Dynamic Range 3. Decreased Frequency Resolution 4.

More information

JARO. Research Article. Abnormal Binaural Spectral Integration in Cochlear Implant Users

JARO. Research Article. Abnormal Binaural Spectral Integration in Cochlear Implant Users JARO 15: 235 248 (2014) DOI: 10.1007/s10162-013-0434-8 D 2014 Association for Research in Otolaryngology Research Article JARO Journal of the Association for Research in Otolaryngology Abnormal Binaural

More information

Chapter 40 Effects of Peripheral Tuning on the Auditory Nerve s Representation of Speech Envelope and Temporal Fine Structure Cues

Chapter 40 Effects of Peripheral Tuning on the Auditory Nerve s Representation of Speech Envelope and Temporal Fine Structure Cues Chapter 40 Effects of Peripheral Tuning on the Auditory Nerve s Representation of Speech Envelope and Temporal Fine Structure Cues Rasha A. Ibrahim and Ian C. Bruce Abstract A number of studies have explored

More information

Psychoacoustical Models WS 2016/17

Psychoacoustical Models WS 2016/17 Psychoacoustical Models WS 2016/17 related lectures: Applied and Virtual Acoustics (Winter Term) Advanced Psychoacoustics (Summer Term) Sound Perception 2 Frequency and Level Range of Human Hearing Source:

More information

PATTERN ELEMENT HEARING AIDS AND SPEECH ASSESSMENT AND TRAINING Adrian FOURCIN and Evelyn ABBERTON

PATTERN ELEMENT HEARING AIDS AND SPEECH ASSESSMENT AND TRAINING Adrian FOURCIN and Evelyn ABBERTON PATTERN ELEMENT HEARING AIDS AND SPEECH ASSESSMENT AND TRAINING Adrian FOURCIN and Evelyn ABBERTON Summary This paper has been prepared for a meeting (in Beijing 9-10 IX 1996) organised jointly by the

More information

Auditory Scene Analysis

Auditory Scene Analysis 1 Auditory Scene Analysis Albert S. Bregman Department of Psychology McGill University 1205 Docteur Penfield Avenue Montreal, QC Canada H3A 1B1 E-mail: bregman@hebb.psych.mcgill.ca To appear in N.J. Smelzer

More information

Hearing Research 231 (2007) Research paper

Hearing Research 231 (2007) Research paper Hearing Research 231 (2007) 42 53 Research paper Fundamental frequency discrimination and speech perception in noise in cochlear implant simulations q Jeff Carroll a, *, Fan-Gang Zeng b,c,d,1 a Hearing

More information

Lexical Tone Perception with HiResolution and HiResolution 120 Sound-Processing Strategies in Pediatric Mandarin-Speaking Cochlear Implant Users

Lexical Tone Perception with HiResolution and HiResolution 120 Sound-Processing Strategies in Pediatric Mandarin-Speaking Cochlear Implant Users Lexical Tone Perception with HiResolution and HiResolution 1 Sound-Processing Strategies in Pediatric Mandarin-Speaking Cochlear Implant Users Demin Han, 1 Bo Liu, 1 Ning Zhou, Xueqing Chen, 1 Ying Kong,

More information

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds Hearing Lectures Acoustics of Speech and Hearing Week 2-10 Hearing 3: Auditory Filtering 1. Loudness of sinusoids mainly (see Web tutorial for more) 2. Pitch of sinusoids mainly (see Web tutorial for more)

More information

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Frequency refers to how often something happens. Period refers to the time it takes something to happen. Lecture 2 Properties of Waves Frequency and period are distinctly different, yet related, quantities. Frequency refers to how often something happens. Period refers to the time it takes something to happen.

More information

J Jeffress model, 3, 66ff

J Jeffress model, 3, 66ff Index A Absolute pitch, 102 Afferent projections, inferior colliculus, 131 132 Amplitude modulation, coincidence detector, 152ff inferior colliculus, 152ff inhibition models, 156ff models, 152ff Anatomy,

More information

Speech, Hearing and Language: work in progress. Volume 11

Speech, Hearing and Language: work in progress. Volume 11 Speech, Hearing and Language: work in progress Volume 11 Periodicity and pitch information in simulations of cochlear implant speech processing Andrew FAULKNER, Stuart ROSEN and Clare SMITH Department

More information

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,900 116,000 120M Open access books available International authors and editors Downloads Our

More information

Cochlear Implants. What is a Cochlear Implant (CI)? Audiological Rehabilitation SPA 4321

Cochlear Implants. What is a Cochlear Implant (CI)? Audiological Rehabilitation SPA 4321 Cochlear Implants Audiological Rehabilitation SPA 4321 What is a Cochlear Implant (CI)? A device that turns signals into signals, which directly stimulate the auditory. 1 Basic Workings of the Cochlear

More information

SOLUTIONS Homework #3. Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03

SOLUTIONS Homework #3. Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03 SOLUTIONS Homework #3 Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03 Problem 1: a) Where in the cochlea would you say the process of "fourier decomposition" of the incoming

More information

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag JAIST Reposi https://dspace.j Title Effects of speaker's and listener's environments on speech intelligibili annoyance Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag Citation Inter-noise 2016: 171-176 Issue

More information

Variability in Word Recognition by Adults with Cochlear Implants: The Role of Language Knowledge

Variability in Word Recognition by Adults with Cochlear Implants: The Role of Language Knowledge Variability in Word Recognition by Adults with Cochlear Implants: The Role of Language Knowledge Aaron C. Moberly, M.D. CI2015 Washington, D.C. Disclosures ASA and ASHFoundation Speech Science Research

More information

Speech Cue Weighting in Fricative Consonant Perception in Hearing Impaired Children

Speech Cue Weighting in Fricative Consonant Perception in Hearing Impaired Children University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange University of Tennessee Honors Thesis Projects University of Tennessee Honors Program 5-2014 Speech Cue Weighting in Fricative

More information

Systems Neuroscience Oct. 16, Auditory system. http:

Systems Neuroscience Oct. 16, Auditory system. http: Systems Neuroscience Oct. 16, 2018 Auditory system http: www.ini.unizh.ch/~kiper/system_neurosci.html The physics of sound Measuring sound intensity We are sensitive to an enormous range of intensities,

More information

Bilaterally Combined Electric and Acoustic Hearing in Mandarin-Speaking Listeners: The Population With Poor Residual Hearing

Bilaterally Combined Electric and Acoustic Hearing in Mandarin-Speaking Listeners: The Population With Poor Residual Hearing Original Article Bilaterally Combined Electric and Acoustic Hearing in Mandarin-Speaking Listeners: The Population With Poor Residual Hearing Trends in Hearing Volume 22: 1 13! The Author(s) 18 Reprints

More information

Binaural Hearing. Steve Colburn Boston University

Binaural Hearing. Steve Colburn Boston University Binaural Hearing Steve Colburn Boston University Outline Why do we (and many other animals) have two ears? What are the major advantages? What is the observed behavior? How do we accomplish this physiologically?

More information

THE ROLE OF VISUAL SPEECH CUES IN THE AUDITORY PERCEPTION OF SYNTHETIC STIMULI BY CHILDREN USING A COCHLEAR IMPLANT AND CHILDREN WITH NORMAL HEARING

THE ROLE OF VISUAL SPEECH CUES IN THE AUDITORY PERCEPTION OF SYNTHETIC STIMULI BY CHILDREN USING A COCHLEAR IMPLANT AND CHILDREN WITH NORMAL HEARING THE ROLE OF VISUAL SPEECH CUES IN THE AUDITORY PERCEPTION OF SYNTHETIC STIMULI BY CHILDREN USING A COCHLEAR IMPLANT AND CHILDREN WITH NORMAL HEARING Vanessa Surowiecki 1, vid Grayden 1, Richard Dowell

More information

Simulation of an electro-acoustic implant (EAS) with a hybrid vocoder

Simulation of an electro-acoustic implant (EAS) with a hybrid vocoder Simulation of an electro-acoustic implant (EAS) with a hybrid vocoder Fabien Seldran a, Eric Truy b, Stéphane Gallégo a, Christian Berger-Vachon a, Lionel Collet a and Hung Thai-Van a a Univ. Lyon 1 -

More information

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080 Perceptual segregation of a harmonic from a vowel by interaural time difference in conjunction with mistuning and onset asynchrony C. J. Darwin and R. W. Hukin Experimental Psychology, University of Sussex,

More information