ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

Similar documents
Acoustic and Spectral Characteristics of Young Children's Fricative Productions: A Developmental Perspective

ACOUSTIC MOMENTS DATA

The effects of intonation on acoustic properties of fricatives

Effect of Consonant Duration Modifications on Speech Perception in Noise-II

Perception of clear fricatives by normal-hearing and simulated hearing-impaired listeners

Speech Cue Weighting in Fricative Consonant Perception in Hearing Impaired Children

Speech Spectra and Spectrograms

Overview. Acoustics of Speech and Hearing. Source-Filter Model. Source-Filter Model. Turbulence Take 2. Turbulence

Consonant Perception test

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

Gick et al.: JASA Express Letters DOI: / Published Online 17 March 2008

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Changes in the Role of Intensity as a Cue for Fricative Categorisation

Production of Stop Consonants by Children with Cochlear Implants & Children with Normal Hearing. Danielle Revai University of Wisconsin - Madison

ACOUSTIC ANALYSIS AND PERCEPTION OF CANTONESE VOWELS PRODUCED BY PROFOUNDLY HEARING IMPAIRED ADOLESCENTS

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

HCS 7367 Speech Perception

ACOUSTIC CHARACTERISTICS OF ARABIC FRICATIVES

Feedback and feedforward control in apraxia of speech: Noise masking effects on fricative production.

Fricative spectra vary in complex ways that depend. Toward Improved Spectral Measures of /s/: Results From Adolescents.

Effects of noise and filtering on the intelligibility of speech produced during simultaneous communication

Identification and classification of fricatives in speech using zero time windowing method

Speech (Sound) Processing

Proceedings of Meetings on Acoustics

A psychoacoustic method for studying the necessary and sufficient perceptual cues of American English fricative consonants in noise

Cue weighting in the perception of Dutch sibilants

Best Practice Protocols

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

Acoustics of Speech and Environmental Sounds

Although considerable work has been conducted on the speech

Bark and Hz scaled F2 Locus equations: Sex differences and individual differences

Sound Texture Classification Using Statistics from an Auditory Model

Recognition of fricatives in normal hearing and simulated hearing-impaired conditions. Examination Number: B001855

Effects of Multi-talker Noise on the Acoustics of Voiceless Stop Consonants in Parkinson's Disease

Computational Perception /785. Auditory Scene Analysis

The Salience and Perceptual Weight of Secondary Acoustic Cues for Fricative Identification in Normal Hearing Adults

Twenty subjects (11 females) participated in this study. None of the subjects had

Linguistic Phonetics Fall 2005

Demonstration of a Novel Speech-Coding Method for Single-Channel Cochlear Stimulation

The Effects of Simultaneous Communication on Production and Perception of Speech

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

HCS 7367 Speech Perception

THE IMPACT OF SPECTRALLY ASYNCHRONOUS DELAY ON THE INTELLIGIBILITY OF CONVERSATIONAL SPEECH. Amanda Judith Ortmann

Speech Adaptation to Electropalatography in Children's Productions of /s/ and /ʃ/

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Perceptual Effects of Nasal Cue Modification

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

It is important to understand as to how do we hear sounds. There is air all around us. The air carries the sound waves but it is below 20Hz that our

Age and hearing loss and the use of acoustic cues in fricative categorization

Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

Lecture Outline. The GIN test and some clinical applications. Introduction. Temporal processing. Gap detection. Temporal resolution and discrimination

11 Music and Speech Perception

Limited available evidence suggests that the perceptual salience of

An MRI study of vocalic context effects and lip rounding in the production of English sibilants

Visi-Pitch IV is the latest version of the most widely

Study of the voicing contrasts in English affricates Due 11/12

Temporal offset judgments for concurrent vowels by young, middle-aged, and older adults

Research Article The Acoustic and Peceptual Effects of Series and Parallel Processing

Speech and Intelligibility Characteristics in Fragile X and Down Syndromes

Speech conveys not only linguistic content but. Vocal Emotion Recognition by Normal-Hearing Listeners and Cochlear Implant Users

Production of word-initial fricatives of Mandarin Chinese in prelingually deafened children with cochlear implants

Hearing in the Environment

The development of a modified spectral ripple test

Automatic Judgment System for Chinese Retroflex and Dental Affricates Pronounced by Japanese Students

Study of perceptual balance for binaural dichotic presentation

Temporal Location of Perceptual Cues for Cantonese Tone Identification

Speech, Hearing and Language: work in progress. Volume 11

The perception of high frequency sibilants in Hungarian male speech

Integral Processing of Visual Place and Auditory Voicing Information During Phonetic Perception

PHONETIC AND AUDITORY TRADING RELATIONS BETWEEN ACOUSTIC CUES IN SPEECH PERCEPTION: PRELIMINARY RESULTS. Repp. Bruno H.

Speech Generation and Perception

The role of low frequency components in median plane localization

WIDEXPRESS. no.30. Background

PERCEPTUAL MEASUREMENT OF BREATHY VOICE QUALITY

Individual Variability of Hearing-Impaired Consonant Perception

Experimental Analysis of Voicing Contrast in Igbo Linda Chinelo Nkamigbo* DOI:

Voice Detection using Speech Energy Maximization and Silence Feature Normalization

Optimal Filter Perception of Speech Sounds: Implications to Hearing Aid Fitting through Verbotonal Rehabilitation

Research Article Measurement of Voice Onset Time in Maxillectomy Patients

Production of contrast between sibilant fricatives by children with cochlear implants a)

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32

Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech

An Experimental Acoustic Study of Dental and Interdental Nonsibilant Fricatives in the Speech of a Single Speaker * Mark J. Jones

THE ROLE OF VISUAL SPEECH CUES IN THE AUDITORY PERCEPTION OF SYNTHETIC STIMULI BY CHILDREN USING A COCHLEAR IMPLANT AND CHILDREN WITH NORMAL HEARING

AJA. Research Article. Effects of Low-Pass Filtering on the Perception of Word-Final Plurality Markers in Children and Adults With Normal Hearing

SPEECH PERCEPTION IN A 3-D WORLD

group by pitch: similar frequencies tend to be grouped together - attributed to a common source.

Language Speech. Speech is the preferred modality for language.

Speech Intelligibility Measurements in Auditorium

METHODS & DESIGNS Generation of speech continua through monaural fusion

Efforts and coordination in the production of bilabial consonants

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080

Cue-specific effects of categorization training on the relative weighting of acoustic cues to consonant voicing in English

Simulations of high-frequency vocoder on Mandarin speech recognition for acoustic hearing preserved cochlear implant

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

International Forensic Science & Forensic Medicine Conference Naif Arab University for Security Sciences Riyadh Saudi Arabia

Transcription:

ISCA Archive ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES Allard Jongman 1, Yue Wang 2, and Joan Sereno 1 1 Linguistics Department, University of Kansas, Lawrence, KS 66045 U.S.A. 2 Department of Linguistics, Cornell University, Ithaca, NY 14850 U.S.A. ABSTRACT This study presents a comparative analysis of acoustic and perceptual cues to place of articulation in English fricatives. Acoustic parameters investigated include spectral peak location, spectral moments, locus equations, F2 onset, overall noise amplitude, relative amplitude, and fricative noise duration. Perception data were collected for isolated fricatives and fricativevowel syllables. The present study investigates which acoustic parameters can account for the perceptual data. Relating the acoustic and perception data by means of regression analysis suggests that spectral peak location, noise amplitude, and relative amplitude are the primary cues to perception of place of articulation in English fricatives. 1. INTRODUCTION Much research has been devoted to the question of whether distinct spectral patterns that correspond to phonetic dimensions, such as place and manner of articulation, can be derived from the acoustic waveform. Such research has predominantly focused on the search for properties distinguishing place of articulation in (English) stop consonants. In contrast, fricatives have been studied in much less detail. Moreover, it is uncertain whether the classification metrics proposed for stop consonants can be successfully applied to fricatives. The current study extends the research on stop consonants to fricative consonants by providing a detailed look in terms of both production and perception. A brief summary of the acoustic and perceptual data will be provided. These data are then related to determine which acoustic cues are most important to fricative perception. 2. EXPERIMENT 1: ACOUSTIC ANALYSIS 2.1 Methods and Procedure Twenty (10 females and 10 males) native speakers of American English recorded the eight English fricatives /f, v,,, s, z,, / in CVC syllables in the carrier phrase "Say again". The fricatives were in initial position, followed by each of six vowels /i, e,ae,, ο, u/. The final consonant was always /p/. Each CVC token was repeated three times, yielding a total of 144 tokens per subject (8 fricatives x 6 vowels x 3 repetitions). All recordings were sampled at 22 khz (16 bit quantization, 11 khz low-pass filter) on a Sun SPARCstation 5. All measurements were made using Entropics Systems' Waves+/ESPS software. Fricative segmentation involved the simultaneous consultation of waveform and wide-band spectrogram. Spectral peak location of the fricatives was examined using a 40 ms full Hamming window placed in the middle of the frication noise. Spectral peak is defined here as the highest-amplitude peak of the FFT spectrum. Spectral moments were computed following the procedures described in [1] with a few modifications. FFTs were calculated using a 40 ms full Hamming window at four different locations in the fricative: onset, middle, and end, as well as centered over fricative offset. Each FFT was treated as a random probability distribution from which the first four moments were calculated. Locus equations were derived using the procedure described in [2] for fricatives. F2 was measured at vowel onset (F2 onset) and midway in the vowel. Specifically, F2 was estimated by means of FFT spectra, with a 23.3 ms full Hamming window. Normalized RMS amplitude in db was measured for the entire noise portion of each fricative token. In order to normalize for intensity differences among speakers, a difference of fricative amplitude minus vowel amplitude was calculated, where vowel amplitude was defined as RMS amplitude (in db) averaged over three consecutive pitch periods at the point of maximum vowel amplitude. Relative amplitude in db was measured as described in [3]. Briefly, for the vowel, a DFT was derived at vowel onset, using a 23.3 ms Hamming window. The amplitude (in db) of the component at F3

for /s, z,, / and at F5 for /f, v,, / was measured. For the fricative, a DFT was then derived at the center of the fricative, using a 23.3 ms Hamming window. The amplitude (in db) of the component in the same frequency region as that selected for the vowel was measured. Relative amplitude was then expressed as the difference between fricative amplitude and vowel amplitude. A four-way ANOVA (Place x Voicing x Vowel x Gender) revealed a main effect for Place [F(3, 2876) = 458.27, p <.0001; η 2 =.308]. Bonferroni post-hoc tests indicated that all four places of articulation were significantly different. Normalized duration in ms was the ratio of fricative duration over total word duration. This duration ratio was chosen over absolute noise duration in order to control for variations in speaking rate. 2.2 Results Only those acoustic parameters that could distinguish all four places of articulation are presented here. For a more complete and detailed discussion, see Jongman et al. (in press) [4]. Spectral peak location A four-way ANOVA (Place x Voicing x Vowel x Gender) revealed a main effect for Place of articulation [F(3, 2876) = 1083.72, p <.0001; η 2 =.512]. Averaged across speakers, voicing, and vowel context, spectral peak location decreases in frequency as place of articulation moves further back in the oral cavity. Bonferroni post-hoc tests indicated that all four places of articulation were significantly different from each other in terms of spectral peak location. Spectral moments In order to assess the importance of acoustic information at different positions in the speech signal, 4-way ANOVAs (Place x Voicing x Vowel x Gender) and subsequent Bonferroni post-hoc tests were conducted for each moment at each window location. Spectral moments distinguished at least three places of articulation at all window locations, and four places in the majority of cases. All but two confusions involved a lack of differentiation between /f, v/ and /, /. Normalized RMS amplitude Using normalized amplitude as the dependent variable, a four-way ANOVA (Place x Voicing x Vowel x Gender) revealed a main effect for Place [F(3, 2876) = 1489.51, p <.0001; η 2 =.591]. Bonferroni post-hoc tests indicated that all four places of articulation were significantly different from each other in terms of normalized amplitude. Relative amplitude

3. EXPERIMENT 2: PERCEPTION OF FRICATIVE NOISE 3.1 Methods and Procedure Twenty native speakers of English without known speech or hearing impairments served as listeners. Speech materials from ten (five females and five males) of the 20 speakers from Experiment 1 were included. Of the 6 vowel contexts, only the vowels /i,, u/ were used. Finally, only the first two repetitions of these tokens were included, yielding a total of 480 stimuli (8 fricatives x 3 vowels x 2 repetitions x 10 speakers). These CVC tokens were then edited to include only the frication noise portion, using the segmentation criteria described in Jongman et al. (in press) [4]. Listeners were tested in groups of two to four. Stimuli were presented to listeners in random order at 3-s intervals. Listeners responded by circling one of 9 alternatives f, v, th (//), dh (//), s, z, sh (//), zh (//), or other printed on answering sheets. 3.2 Results Table 1 presents the mean identification scores for each fricative. Response /f/ /v/ // // /s/ /z/ // // /f/ 70 1 22 4 1 0 0 0 /v/ 8 72 5 13 0 1 0 0 // 21 1 62 11 4 0 0 0 // 4 49 12 30 0 2 0 1 /s/ 0 0 1 0 90 3 2 0 /z/ 0 1 0 1 7 87 0 2 // 0 0 1 0 0 0 95 1 // 0 0 0 1 0 1 4 92 Table 1: Mean identification scores (%) for each fricative, given the fricative noise portion, averaged across 20 listeners. Bold numbers indicate correct response rates. Averaged across all fricatives, accuracy was 74.9%. A three-way ANOVA (Fricative x Vowel x Gender) revealed a main effect for Fricative [F(7, 47) = 68.08, p <.0001]. Bonferroni post-hoc tests indicated that perception of all non-sibilants was significantly poorer than that of sibilants. All sibilants were equally well perceived, but within the group of non-sibilants, perception of // was significantly less accurate. There were no other main effects or interactions. 4. EXPERIMENT 3: PERCEPTION OF FRICATIVE-INITIAL SYLLABLES 4.1 Methods and Procedure Twenty new listeners were recruited. Stimuli were those used in Experiment 2, except that the entire fricative-vowel -/p/ syllables were presented for identification of the initial fricative. Procedures were identical to those of Experiment 2. 4. 2 Results Table 2 presents the mean identification scores for each fricative. Response /f/ /v/ // // /s/ /z/ // // /f/ 88 1 10 0 0 0 0 0 /v/ 1 90 1 9 0 0 0 0 // 8 0 87 3 1 0 0 0 // 0 17 8 74 0 1 0 0 /s/ 0 0 2 1 94 3 0 0 /z/ 0 0 1 2 0 96 0 0 // 0 0 1 0 0 0 99 0 // 0 0 0 0 0 7 7 99 Table 2: Mean identification scores (%) for each fricative, given the entire fricative-initial syllable, averaged across 20 listeners. Bold numbers indicate correct response rates. Averaged across all fricatives, accuracy was 90.9%. A three-way ANOVA (Fricative x Vowel x Gender) revealed main effects for Fricative [F(7, 47) = 11.73, p <.0001] and Vowel [F(2, 47) = 6.03, p <.004]. Bonferroni post-hoc tests indicated that perception of // was significantly poorer than of any other fricative. In addition, /f, v,, s, z/ were all equally well perceived while perception of /, / was significantly better than that of /f, v, /. Post-hoc tests for the Vowel effects indicated that fricative perception was most accurate in the context of /u/ (94.1%), followed by // (91.7%), followed in turn by /i/ (86.9%). 4.3 Discussion A comparison of Experiments 2 and 3 reveals that fricative perception is better on the basis of fricative-vowel syllables than fricative noise alone. A two-way ANOVA (Fricative x Experiment) on the combined data from Experiments 2 and 3 revealed a main effect for Fricative [F(7, 15) = 64.47, p <.0001] and Experiment [F(1, 47) = 148.49, p <.0001]. A significant Fricative x Experiment interaction [F(7, 15) = 13.31, p <.0001]

and subsequent post-hoc tests indicated that perception of all fricatives but /s/ was significantly better on the basis of the fricative-vowel syllable. Also, given vocalic information improvement is much greater for the non-sibilant fricatives as compared to the sibilant fricatives. 5. RELATING PERCEPTION TO ACOUSTIC PARAMETERS The acoustic data summarized in Experiment 1 and in [4] were compared to the perceptual data from Experiments 2 and 3. Regression analyses were performed to determine the extent to which the acoustic parameters could account for the perception results. Two separate analyses were conducted one analysis compared the acoustic data (Exp. 1) to the perceptual data based on fricative noise alone (Exp. 2); the other analysis compared the acoustic data (Exp. 1) to the perceptual data of the fricative-initial syllable (Exp. 3). The following acoustic properties were entered into the regressions: spectral peak location; F2 frequency at vowel onset; normalized RMS amplitude, relative amplitude; normalized noise duration; and spectral moments (mean, variance, skewness, and kurtosis) calculated at four different locations in the fricative. In order to account for the results from Experiment 3 in which listeners heard fricative-initial syllables, all acoustic properties were entered into a stepwise linear analysis. Since Experiment 2 involved the frication noise only, those acoustic properties relating to other portions of the original fricative-vowel syllables were excluded from the analysis. Therefore, only spectral peak location, normalized RMS amplitude, normalized noise duration, and spectral moments computed at the onset, middle, and end of the fricative were entered into the analysis. In order to focus on the extent to which acoustic properties could account for correctly perceived fricatives, only those tokens with identification accuracies of 80% or more were included. For Experiment 2 involving only the fricative noise, 99 of the original 240 tokens were excluded from the regression analysis. Exclusion rates were 18 for /f/, 15 for /v/, 27 for //, 30 for //, 2 for /s/, 4 for /z/, 1 for //, and 2 for //. For Experiment 3 involving fricative-vowel syllables, 36 of the original 240 tokens were excluded from the regression analysis. 5.1 Regression Analysis: Fricative Noise Stepwise linear regression revealed the following acoustic properties as significant predictors of fricative perception: normalized RMS amplitude, normalized duration, spectral peak location, skewness at fricative midpoint, kurtosis at fricative onset, and kurtosis at fricative offset. These parameters together accounted for 32% of the perception data (adjusted r 2 =.32). Stepwise linear regression results for each fricative individually revealed significant predictors for only /z/ and //. Normalized RMS amplitude and skewness at fricative onset accounted significantly for perception of /z/ (adjusted r 2 =.65), while perception of // was well accounted for by the spectral mean at fricative onset and kurtosis at fricative midpoint (adjusted r 2 =.62). 5.2 Regression Analysis: Fricative-initial Syllables Stepwise linear regression revealed the following acoustic properties as significant predictors of fricative perception: spectral variance at fricative midpoint, spectral peak location, kurtosis at the window spanning the last 20 ms of frication and the first 20 ms of the vowel, normalized RMS amplitude, normalized duration, spectral mean at fricative midpoint, and relative amplitude. These parameters together accounted for 33% of the perception data (adjusted r 2 =.33). Normalized RMS amplitude was a significant predictor for // and /z/, accounting for 13 and 29% of the variance, respectively. Spectral moments were moderately successful: spectral mean across fricative offset and vowel onset accounted for 33% of the variance of the perception of /f/, while kurtosis at fricative midpoint accounted for 15% of the variance of the perception of //. Finally, spectral peak location accounted for 17% of the variance of perception of /s/. 6. CONCLUSIONS Results from acoustic analyses indicate that spectral peak location, spectral moments, normalized RMS amplitude, and relative amplitude all distinguish English fricatives in terms of place of articulation. Results from perception experiments suggest that the fricative-to-vowel transitions carry important cues to place of articulation for the non-sibilant but not the sibilant fricatives. Finally, results from regression analysis in which the perception data were regressed on the acoustic parameters again point to spectral peak location, normalized rms amplitude, and relative amplitude as important cues to perception of place of articulation. Future research will explore these suggestions in more detail by conducting perception experiments in which these acoustic parameters will be directly manipulated. 7. ACKNOWLEDGEMENTS This research was supported (in part) by research grant number 1 R29 DC 02537-01A1 from the National Institute on Deafness and

Other Communication Disorders, National Institutes of Health. Preliminary versions of these results were presented at the Seattle [5] and Berlin [6] meetings of the Acoustical Society of America. The authors thank Scott Gargash, Eric Evans, Ratree Wayland, and Serena Wong for assistance. 8. REFERENCES 1. Forrest, K., Weismer, G., Milenkovic, P., and Dougall, R.N. Statistical analysis of word-initial voiceless obstruents: Preliminary data, J. Acoust. Soc. Am. 84, 115-124, 1988. 2. Sussman, H.M, and Shore, J. Locus equations as phonetic descriptors of consonantal place of articulation, Perc.Psychophys. 58, 936-946, 1996. 3. Hedrick, M.S., and Ohde, R.N. Effect of relative amplitude of frication on perception of place of articulation, J. Acoust. Soc. Am. 94, 2005-2027, 1993. 4. Jongman, A., Wayland, R., and Wong, S. Acoustic characteristics of English fricatives, J. Acoust. Soc. Am., in press. 5. Jongman, A., Sereno, J., Wayland, R., and Wong, S. Acoustic properties of English fricatives, in P. Kuhl and L. Crum (eds.), Joint Proc. 16th Int. Con. on Acoust. and 135th Meet. Acoust. Soc. Am., 2935-2936, 1998. 6. Jongman, A., Spence, M., Wang, Y., Kim, B., and Schenck, D. Perceptual properties of English fricatives, J. Acoust. Soc. Am. 105, 1401(A), 1999.