ACOUSTIC CHARACTERISTICS OF ARABIC FRICATIVES

Size: px
Start display at page:

Download "ACOUSTIC CHARACTERISTICS OF ARABIC FRICATIVES"

Transcription

1 COUSTIC CHRCTERISTICS OF RBIC FRICTIVES By MOHMED LI L-KHIRY DISSERTTION PRESENTED TO THE GRDUTE SCHOOL OF THE UNIVERSITY OF FLORID IN PRTIL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORID 2005

2 Copyright 2005 by Mohamed li l-khairy

3 To my father who did not live to see the fruit of his work.

4 CKNOWLEDGMENTS fter finishing writing this dissertation on a rainy summer night I decided not to bother with a lengthy acknowledgment section. fter all I was the one who wrote it. Well, leaving ego and false pride aside, this work could not have been done without the help of many. First and foremost, thanks go to The lmighty GOD for His guidance and blessings without which graduate school would have been a worse nightmare. My gratitude goes also to my wonderful supervisor and mentor Dr. Ratree Wayland whose dedication to her students, teaching, and research is beyond highest expectations. Without her help, guidelines, constant encouragement, and support, this work would not have been possible. Members of my supervisory committee (Dr. Gillian Lord and Dr. Caroline Wiltshire from Linguistics, and Dr. Rahul Shirvastav from Communication Sciences and Disorders) were of the utmost help in the process of finishing this work. My stay in Gainesville introduced me to many people. Most were nice and cheerful and some one could definitively live without. I will skip the latter group to save space. However, among such nice and wonderful people I got to know during this journey are the wonderful students, faculty, and staff of the Linguistics Department who were of tremendous help both personally and academically. My special thanks and gratitude go also to Dr. ida Bamia and Dr. Haig Der-Houssikian from the Department of frican and sian Languages and Literature. Their supervision, friendship, and encouragement went far beyond the responsibilities of mentors to those of parents. For that I will be eternally grateful. I also would like to thank my study partners, Yousef l-dlaigan, who was unjustly forced to change his career, and bdulwaheed l-saadi, who was brave enough iv

5 to finish his Ph.D. I regret to say that I am still unclear of the process of gene transformation in strawberry and citrus. I hope though you learned from me how to read a spectrogram. I tried my best. Now is the fun part: thanking my friends in the phonetics lab. Listed in chronological order of their liberation from school are Rebecca Hill, Jodi Bray, Philip Monahan, Sang-Hee Yeon, HeeNam Park, Victor Prieto, and Manjula Shinge. Yet to feel the wonderful breeze outside Turlington basement are my great friends ndrea Dallas, Bin Li, and Priyankoo Sarmah. I thank them for all the cheerful moments and laughs we shared at the University of Florida. lthough life might take us into different routes, our friendship is eternal. lthough they are in a different time zone, I thank my friends on the west cost and across the tlantic for their great advice and emotional support, without which long nights would definitely have been longer. I will send them my phone bills later. I am sure that I left out some names; for those unintentionally missed I extend my apologies and sincere thanks. The acoustic analyses in this dissertaion were carried out in a timely manner thanks to the existence of the wonderful free PRT program and the abundant help and suggestion from its authors and the PRT user community. lso, I was extremely fortunate to escape the nightmare of typesetting using the popularbut-not-really-friendly commercial software. I thank Ron Smith for making his ufthesis L TEX class freely available. cross oceans and continents, the prayers and encouragement of my parents and siblings were a driving force and endless motivation to finish and join them back home. lthough God had other plans for my father and older brother, I am sure they are proud of what their prayers from high above have accomplished. Finally, words fall short in describing my gratitude and thanks toward my wife, Nadaa; and kids, Faisal and Farah. They have suffered through this dissertation v

6 almost as much as I have; maybe even more. Through the many nights I spent at the lab, they have shown endless patience, love, and understanding. I truly cannot imagine having gone through this process without such amazing love and support. Parts of this work were supported by a McLaughlin Dissertation Fellowship from the College of Liberal rts and Sciences, University of Florida. vi

7 TBLE OF CONTENTS page CKNOWLEDGMENTS iv LIST OF TBLES ix LIST OF FIGURES x BSTRCT xii CHPTER 1 INTRODUCTION LITERTURE REVIEW Introduction Fricative Production coustic Cues to Fricative Place of rticulation mplitude Cues Duration Cues Spectral Cues Formant Transition Cues Studies of rabic Fricatives METHODOLOGY Data Collection Participants Materials Recording Data nalysis Segmentation of Speech coustic nalyses Statistical nalyses MPLITUDE ND DURTION mplitude Measurements Normalized Frication Noise RMS mplitude Relative mplitude of Frication Noise vii

8 4.2 Temporal Measurements bsolute Duration of Frication Noise Normalized Duration of Frication Noise SPECTRL MESUREMENTS Spectral Peak Location Spectral Moments Spectral Mean Spectral Variance Spectral Skewness Spectral Kurtosis FORMNT TRNSITION Second Formant (F 2) at Transition Locus Equation STTISTICL CLSSIFICTION OF FRICTIVES Discriminant Function nalysis Classification ccuracy of DF Classification Power of Predictors Classification Results GENERL DISCUSSION Temporal Measurement mplitude Measurement Spectral Measurement Transition Information Discriminant nalysis Conclusion REFERENCES BIOGRPHICL SKETCH viii

9 Table LIST OF TBLES page 1 1 rabic Fricatives Relative mplitude: Vowel Context Mean Relative mplitude Spectral Peak Location Spectral Moments Spectral Skewness: Significant Contrasts for Voiced Fricatives Spectral Skewness: Significant Contrasts for Voiceless Fricatives Second Formant at Transition Locus Equation: Slope and y-intercept Prior Probabilities for Group Membership Variance ccounted for by DF Functions Overall Voiceless Classification Cross-Validated Classification Results Overall Voiced Classification Cross-Validated Voiced Classification Overall Voiceless Classification Cross-Validated Voiceless Classification ix

10 Figure LIST OF FIGURES page 3 1 Example of Segmentation Segmentation of /Q/ Hamming vs. Kaiser Window Duration Frication Noise RMS mplitude Frication Noise RMS mplitude: Vowel Context Frication Noise RMS mplitude: Place and Voicing Relative mplitude Relative mplitude: Place and Voicing Relative mplitude; Place and Short Vowels Relative mplitude; Place and Long Vowels Relative mplitude: Voicing and Short Vowels Relative mplitude: Voicing and Long Vowels Fricative Duration: Place and Voicing Fricative Duration: Place and Voicing Interactions Fricative Duration: Vowel Context Normalized Frication Noise: Place and Voicing Normalized Fricative Duration: Place and Voicing Interactions Normalized Frication Noise: Vowel Context Spectral Peak Location: Place and Voicing Spectral Peak Location: Place Voicing Interaction Spectral Peak Location: Place Vowels Spectral Peak Location: Place Short Vowel Interaction x

11 5 5 Spectral Peak Location: Place Long Vowel Interaction Spectral Mean: Place and Voicing Spectral Mean: Voice Spectral Mean: Place Voicing Interaction Spectral Mean: Vowel Spectral Variance: Place and Voicing Spectral Variance: Place Voicing Interaction Spectral Variance: Vowel Spectral Skewness: Place and Voicing Spectral Skewness: Voice Spectral Skewness: Place Voicing Interaction Spectral Skewness: Vowel Spectral Kurtosis: Place and Voicing Spectral Kurtosis: Voicing Spectral Kurtosis: Place Voice interaction Spectral Kurtosis: Vowel Second Formant: Place Voicing Interaction Second Formant: Vowel Context Locus Equation Discrimination Plane Discrimination Plane by Voicing xi

12 bstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Chair: Ratree Wayland Major Department: Linguistics COUSTIC CHRCTERISTICS OF RBIC FRICTIVES By Mohamed li l-khairy ugust 2005 The acoustic characteristics of fricatives were investigated with the aim of finding invariant cues that classify fricatives into their place of articulation. However, such invariant cues are hard to recognize because of the long-noticed problem of variability in the acoustic signal. Both intrinsic and extrinsic sources of variability in the speech signal lead to a defective match between a signal and its percept. Nevertheless, such invariance can be circumvented by using appropriate analysis methods. The 13 fricatives of Modern Standard rabic (/f, T, D, D Q, s, s Q, z, S, X, K, è, Q, h/) were elicited from 8 male adult speakers in 6 vowel contexts (/i, i:, a, a:, u, u:/). The acoustic cues investigated included amplitude measurements (normalized and relative frication noise amplitude), spectral measurements (spectral peak location and spectral moments), temporal measurements (absolute and normalized frication noise duration), and formant information at fricative-vowel transition (F2 at vowel onset and locus equation). For the most part, fricatives in rabic had patterns similar to those reported for similar fricatives in other languages (e.g., English, Spanish, Portuguese). discriminant function analysis showed that among all the cues investigated, spectral xii

13 mean, skewness, second formant at vowel onset, normalized RMS amplitude, relative amplitude, and spectral peak location were the variables contributing the most to overall classification with a success rate of 83.2%. When voicing was specified in the model, the correct classification rate increased to 92.9% for voiced and 93.5% for voiceless fricatives. xiii

14 CHPTER 1 INTRODUCTION Since the early years of speech research, studies (using various models and methods) have focused on finding the properties that distinguish among naturally produced speech sound. Many such studies investigated the properties of the acoustic signal through which sound is transmitted from speaker to hearer. However, the task is complicated by the long-noticed problem of variability in the acoustic signal resulting in a defective match between a signal and its percept (Liberman, Cooper, Shankweiler, and Studdert-Kennedy 1967). The production mechanism of speech sounds, particularly fricatives, involves intrinsic sources of variability arising from changes in the shape of the vocal tract and the rate of air flow (Strevens 1960; Tjaden and Turner 1997). Variability in the speech signal also arises from extrinsic sources including speaker age (Pentz, Gilbert, and Zawadzki 1979), vocal tract size (Hughes and Halle 1956), speaking rate (Nittrouer 1995), and linguistic context (Tabain 2001). Variability in speech also is often a result of a combination of these factors. Withstanding the variability found in the speech signal, numerous studies (Stevens 1985; Behrens and Blumstein 1988a,b; Forrest, Weismer, Milenkovic, and Dougall 1988; Sussman, McCaffrey, and Matthews 1991; Hedrick and Ohde 1993; Jongman, Wayland, and Wong 2000; bdelatty li, Van der Spiegel, and Mueller 2001; Nissen 2003) found invariant cues in the speech signal when the appropriate analyses are carried out. long this line of research, our study investigated the defining properties of fricative sounds as produced in Modern Standard rabic (MS). 1

15 2 We used rabic fricatives for three equally important reasons. First, the articulatory space of fricatives in rabic spans across most of the places of articulation in the vocal tract, starting from the lips and ending at the glottis. Second, unlike most of the languages used in acoustic studies of fricatives, rabic has two unique features that serve a phonemic distinction: pharyngeal co-articulation and segment length. Specifically, a phonemic distinction exists between plain fricatives (/D/ and /s/) and their pharyngealized counterparts /D Q / and /s Q / in rabic. Furthermore, although governed by some phonological distribution rules, consonant and vowel length in rabic are phonemic. Third, most studies on the acoustic characteristics of fricatives were conducted predominantly with reference to English fricatives. Given the phonetic status of rabic and the gap in the literature due to the lack of rabic-related research, our study is theoretically and empirically important. Our findings will contribute generally to the way fricative production is viewed and specifically to the way languages differ in that respect. Further, such findings will aid speech synthesis and parsing softwares related to the less-understood, yet important, rabic language. s mentioned, both consonant and vowel length are phonemic in rabic. However, to compare and contrast the performance of cues used in our study with those reported in the literature for other languages, we examined only vowel length variations. The inventory of fricatives in rabic is shown in Table 1 1. rabic has 11 fricatives, with only 4 pairs in voicing contrast. lso, for voiced dental and voiceless alveolar fricatives, a pharyngealized counterpart also exists. The voiced post-alveolar fricative /Z/ was excluded, since it was articulated in most of the elicited data as an affricate /Ã/. Studies of Standard rabic and rabic dialectology suggest that /Z/ is realized as either /Z, Ã, g/ or /j/ depending on the geographical region in which rabic is spoken (Kaye 1972).

16 3 Table 1 1. Place of articulation of rabic fricatives Labio- Post- Dental lveolar dental alveolar Uvular Pharyngeal Glottal voiceless f T s S X è h voiced D z K Q /D/ and /s/ have pharyngealized counterparts /D Q / and /s Q /. Both local (static) and global (dynamic) cues have been shown to participate in the identification of (English) fricatives. Specifically, three main acoustic features have been examined in research aimed to distinguish fricatives: the spectral properties of the frication noise, the relation between the frequency characteristics of frication noise versus the vowel, and duration of frication noise. Our study aimed to describe the acoustic characteristics of rabic fricatives using many of the acoustic measurements used in other related studies with specific interest in finding cues that differentiate between plain and pharyngealized fricatives. Our study also aimed to see if phonemic differences in vowel length affect the acoustic cues measured. Our data were elicited from 8 male adult speakers (mean age = 20) who had no history of hearing or speaking impairments and who had limited experience with English as a second language. Cues investigated in our study were amplitude measurements (normalized and relative frication noise amplitude), spectral measurements (spectral peak location and spectral moments), temporal measurements (absolute and normalized frication noise duration), and formant information at fricative-vowel transition (F2 at vowel onset and locus equation). Normalized amplitude is defined here as the ratio between the average RMS amplitude (in db) of three consecutive pitch periods at the point of maximum vowel amplitude and the RMS amplitude of the entire frication noise. Relative amplitude, on the other hand, is defined as the amplitude of the frication noise relative to the vowel amplitude measured in certain frequency regions. Spectral peak location relates the fricative place of articulation to the

17 4 frequency location of energy maximum in the frication noise. Spectral moments analysis is a statistical approach that treats FFT spectra as a random probability distribution from which the first four moments (mean, variance, skewness, and kurtosis) are calculated. Spectral mean refers to the average energy concentration and variance to its range. Skewness, on the other hand, is a measure of spectral tilt that indicates the frequency of most energy concentration. Kurtosis is an indicator of the distribution peakedness. Formant transitions were assessed using locus equations that relate second formant frequency at vowel onset (F2 onset ) to that at vowel midpoint (F2 vowel ). long with reporting how each of the acoustic measures mentioned above differentiates between different places of fricatives articulation, we used a statistical method (discriminant function analysis) to find the most parsimonious combination of acoustic cues that distinguish among the different places of fricative articulation and the contribution of each selected cue to the overall classification of fricatives into their places of articulation.

18 CHPTER 2 LITERTURE REVIEW 2.1 Introduction In this chapter we review relevant literature that deals with the acoustic characteristics that have been shown to be effective in differentiating among fricative place of articulation and voicing in the world s languages. Given the fact that certain fricatives that exist in Standard rabic (e.g., pharyngealized vs. non-pharyngealized) do not occur in other languages of the world, in this chapter, we also discuss whether these acoustic cues will be effective in differentiating acoustically among Standard rabic fricatives. 2.2 Fricative Production Fricative production is best described in terms of the source-filter theory of speech production (Fant 1960). ccording to that theory, speech can be modeled as a result of two independent components: a source signal (which could be the glottal source, or noise generated at a compressed level in the vocal tract); and a filter (reflecting the resonance in the cavities of the vocal tract downstream from the glottis, or the constriction). The basic mechanism for fricative production is that a turbulence forms in the air flow at a point in the oral cavity. To generate such turbulence, a steady air flow with velocity greater than a critical number 1 passes through a narrow constriction in the oral cavity and forms a jet that mixes with surrounding air in 1 This number is Reynold s Number (Re) which is a dimensionless quantity that relates the constriction size to the volume velocity needed to produce turbulence in the air. For speech Re > 1800 (Kent and Read 2002). 5

19 6 the vicinity of a constriction to generate eddies. These eddies, which are random velocity fluctuations in the air flow, act as the source for frication noise (Stevens 1971). Depending on the nature of the constriction, frication noise can also be generated at either an obstacle or a wall (Shadle 1990). ccording to Shadle, obstacle source refers to fricatives in which sound is generated primarily at a rigid body perpendicular to the air flow. n example is the production of voiceless alveolar and voiceless post-alveolar fricatives (/s, S/): the upper and lower teeth, respectively, act as the spoiler for the airflow. Such sources are characterized by a maximum source amplitude for a given velocity. On the other hand, wall source occurs when sound is generated primarily along a rigid body parallel to the air flow. Spectrums of sounds generated by a wall source, like voiced and voiceless velar fricatives (/x, G/), are characterized by a flat broad peak with less amplitude than sounds of obstacle sources (Shadle 1990). Vibration of the vocal folds also adds to the sources responsible for voiced fricative production. Whatever the source, the resulting turbulence is then modified by the resonance characteristics of the vocal tract (filter). The spectrum of the product of such a filter represents the effect of transfer function of the vocal tract which in turn depends on 1) the natural frequencies of the cavities anterior to the constriction (poles), 2) the radiation characteristics of the sound leaving the mouth, and 3) the resonant frequency of the posterior cavity (zeros). For fricatives, the vocal tract is tightly constricted and hence the coupling between the front and back cavities is small (Johnson 1997). Therefore, the transfer function of the vocal tract for fricatives is largely dependent on the resonances of the front cavity. The nth resonance can be calculated using Equation (2 1) where c is the speed of sound and l is the length of the vocal tract. In case a strong coupling occurs between the front and back cavities, such as when the constriction is gradually tapered (Kent and Read 2002, p. 43), the resonances of the back cavity are calculated using Equation

20 (2 2). Resonances of the back and front cavities sharing the same frequency and bandwidth cancel each other out. 7 (2n 1) c fn front = 4l fn back = (n) c 2l (2 1) (2 2) 2.3 coustic Cues to Fricative Place of rticulation Both local (static) and global (dynamic) cues have been shown to participate with different degrees in the identification of (English) fricatives. The three main acoustic cues that have been of most interest in the literature on fricatives are the amplitude and spectral properties of the frication noise, the relationship between the frequency characteristics of frication noise and those of the vowel, and the role of duration of frication noise in distinguishing fricative place and voicing mplitude Cues Frication amplitude Most studies of frication noise amplitude have focused on (English) voiceless fricatives, and found similar results: sibilants (/s, z, S, Z/) have higher amplitude than nonsibilants (/f, v, T, D/) with no differences within each class. This difference in amplitude between sibilants and nonsibilants is predictable if one looks into the aerodynamics of producing these fricatives. For example, to examine fricative production mechanisms, Shadle (1985) used a mechanical model in which constriction area, length, location can vary, and the presence or absence of an obstacle can be manipulated. Based on results from spectra produced using such a model, Shadle (1985) concluded that the lower teeth act as an obstacle at some 3 cm downstream from the noise source of sibilant constriction. Such configuration results in an increase in turbulence of the airflow, which in turn causes an increase in the sibilant amplitude. Nonsibilant fricatives, on the other hand, have no such obstacle, resulting in very low energy levels. The difference between the sibilant

21 8 and nonsibilant fricatives with regard to frication amplitude was also found to have auditory salience. McCasland (1979) studied the role of amplitude as a perceptual cue to fricative place of articulation. He cross-spliced naturally spoken syllables of English /f, T, s, S/ and /i/ such that the fricative part in /si/ and /Si/ was cross-spliced to the vocalic part of both /fi/ and /Ti/. The overall amplitude of the spliced-in frication noise was attuned to the same level of intensity as that of the original nonsibilant fricative by reducing /s, S/ amplitude to that of /f/ and /T/. The resulting fricative-vowel syllables sounded like /fi/ and /Ti/ when the vocalic part of the utterance was coming from an original /fi, Ti/, respectively. These findings led McCasland to conclude that the low amplitude of nonsibilant fricatives was used as a perceptual cue to distinguish them from the sibilants /s, S/. However, because of the cross-splicing method used, it is not clear whether the results can be attributed solely to the reduction of /s, S/ amplitude. In fact, Behrens and Blumstein (1988a) pointed out that the results of McCasland s method are not conclusive since the method involves mismatching information from frication noise and vocalic transition. Specifically, it is not clear whether listeners were using the reduced noise amplitude of sibilants as a cue for nonsibilants, or they were using transitional information in the original vocalic part of the nonsibilant to judge the token to be /f, T/. Listeners might be using either one of those cues, or both; and there was no way of telling which, using the cross-splicing methodology. One way to remedy the shortcomings of the cross-splicing method is to use synthetic speech. Gurlekian (1981) used synthetic /sa, fa/ syllables in which the frequency and the amplitude of the vowel were kept constant in order to test whether the distinction between sibilant and nonsibilant fricatives could be based solely on differences in their noise amplitude. For fricatives, the center frequency of the noise was kept fixed at 4500 Hz, while its amplitude was manipulated to vary relative to the fixed vowel amplitude. The central frequency used was similar to the

22 9 range at which /s/ was correctly identified 90% of the time by rgentine Spanish listeners (Manrique and Massone 1979), and within the range described for English /s/ (Heinz and Stevens 1961). n identification test with 6 rgentine Spanish and 6 English listeners showed that both groups assigned a /fa/ percept to the tokens with low noise amplitude and a /sa/ percept to those with high noise amplitude. lso, Behrens and Blumstein (1988a) investigated the role of fricative noise amplitude in distinguishing place of articulation among fricatives. Basically, Behrens and Blumstein altered the amplitude of the frication part of CV syllables, with the C being one of /f, T, s, S/, while preserving the vocalic part of the utterance. This matching was done by raising the noise amplitude of /f, T/ to that of /s, S/ and conversely, lowering the noise amplitude of /s, S/ to that of /f, T/ without substituting or changing the vocalic part of the utterance. They found, contrary to previous studies, that the overall amplitude of the fricative noise relative to the amplitude of the following vowel does not constitute the primary cue for sibilant/nonsibilants distinction. Therefore, Behrens and Blumstein called for an integration of spectral properties and amplitude characteristics of fricatives in order to successfully discriminate among their places of articulation. nother way to capture classification information found in frication noise amplitude is to measure the Root-Mean-Square (RMS) amplitude of the fricative noise normalized relative to the vowel. Jongman et al. (2000) used this method in their large-scale study of English fricatives. mong the many measures used to characterize fricatives, Jongman et al. measured the difference between the average RMS amplitude (in db) of three consecutive pitch periods at the point of maximum vowel amplitude and the RMS amplitude of the entire frication noise. Results were derived from 20 native speakers of merican English (10 females and 10 males). The speakers produced all 8 English fricatives in the onset of CVC syllables with the rhyme consisting of each of six vowels /i, e, æ,, o, u/ and /p/. The authors

23 10 found that this normalized RMS amplitude can differentiate among all four places of fricatives in English with voiced fricatives having a smaller amplitude than their voiceless counterparts. The integration of fricative and vowel amplitude as a way of normalization was also used for automatic recognition of continuous speech. bdelatty li et al. (2001) used Maximum Normalized Spectral Slope (MNSS), which relates the spectral slope of the frication noise spectrum to the maximum total energy in the utterance, thus capturing the spectral shape of the fricative and its amplitude in addition to the vowel amplitude features in one quantity. It differs, however, from Jongman and colleagues normalized amplitude in two ways: first it uses peak amplitude instead of RMS amplitude for the vowel and the fricative; and second, it uses only the strongest peak of the fricative (as opposed to whole frication noise) and normalizes that in relation to the strongest peak of the vowel (as opposed to the average of the strongest three pitch periods). For MNSS, a statistically determined threshold (0.01 for voiced and 0.02 for voiceless fricatives) is used to classify the fricative as nonsibilant if MSNN falls below the threshold, and as sibilant if it is above it. Using such criteria, bdelatty li et al. obtained a 94% recognition accuracy of sibilant vs. nonsibilants fricatives. No further information was given on using MSNN to classify fricatives within these classes Relative amplitude Since amplitude cues from the frication noise and spectral cues of the vocalic part in a syllable depend on each other (Behrens and Blumstein 1988a; Jongman et al. 2000); changes in amplitude might carry more perceptual weight if the frequency range over which such changes occur is taken into consideration. Such integration was presented by Stevens and Blumstein (1981) as an invariant property of speech production. They demonstrated theoretically that different amplitude changes that occur at the consonant-vowel boundary in certain frequency

24 11 ranges are related to articulatory mechanisms associated with certain places in the vocal tract. Therefore, listeners might be using these relational values as a cue for the place of a consonant production. To test this claim, Stevens (1985) synthesized sibilant/nonsibilant and anterior/nonanterior continua such that the frication noise amplitude at certain frequency ranges on the continuum was gradually changed from one stimuli to the other. Listeners judgments abruptly shift from /T/ to /s/ when the amplitude of frication noise in the fifth and sixth formant frequency regions (F5 & F6 ) is increased relative to the amplitude in the same frequency regions at vowel onset. On the other hand, listeners identified the consonant to be /s/ rather than /S/ when the frication noise amplitude at the F3 region, relative to F3 amplitude of the vowel, rises at the transition and as /S/ if it falls. These findings led Stevens to hypothesize that the vowel is used as an anchor against which the spectrum of the fricative noise is judged or evaluated (Stevens 1985, p. 249). Other researchers tried to test the robustness of this feature in different contexts. Hedrick and Ohde (1993) looked into the effect of frication duration and vowel context on the relative amplitude and whether such changes would affect perception of fricative place of articulation. This was done by varying the amplitude of the fricative relative to vowel onset amplitude at F3 and F5 for the contrast /s/-/s/ and /s/-/t/ respectively. Frication duration and vowel context also varied. Ten adult listeners with no history of speech or hearing disorders who successfully perceived (with 70% accuracy) the end points of /s - S/ and /s - T/ continua were asked to identify each stimulus as one member of the contrastive pairs above. In the /s/-/s/ contrast, listeners chose more /s/ responses when presented with lower relative amplitude and more /S/ s when presented with higher relative amplitude. These findings held constant across the different vowel and duration conditions and were in agreement with those obtained by Stevens (1985).

25 12 Furthermore, the additional post-fricative vowel contexts in Hedrick and Ohde s study influenced only the magnitude of the relative amplitude effect for a given contrast. Hedrick and Ohde claim that relative amplitude is used as a primary invariant cue since listeners used relative amplitude information more effectively than the context-dependent formant transitions. To further test this assumption, Hedrick and Ohde (1993) also varied along a continuum the appropriate formant transitions of the contrasts presented above while keeping the relative amplitude fixed across all stimuli. The hypothesis was that if relative amplitude was indeed a primary cue, then variation in formant transition would not affect identification of members of the contrasting pair. Their findings indicate that for the /s/-/s/ contrast, formant transition did affect the identification of at least the end points of the continua. For the /s/-/t/ contrast, formant transitions had a negligible effect on the identification of the two fricatives even at boundary points. Taken together, all these findings indicate that relative amplitude is part of a primary cue to fricative place of articulation. Such a role becomes more salient when the contrast involves sibilant vs. nonsibilant fricatives. dditionally, Hedrick and Ohde (1993) findings also suggest that formant transitions do influence the perception of fricative place of articulation, at least among sibilants. However, a trading relationship seems to exist between the use of the two cues in the presence of factors obstructing an effective use of a given cue. Hedrick (1997) found that listeners with sensorineural hearing loss relied less on formant transition information than on relative amplitude in discriminating between English /s/ and /f/. On the other hand, listeners with normal hearing showed the opposite preference. This was the case even when the formant transition information was presented at a level audible to listeners with sensorineural hearing loss. So far, relative amplitude has been shown only to differentiate between sibilants and nonsibilants as a class, with the exception of Jongman et al. (2000)

26 13 study, in which they found that relative amplitude, as defined by Hedrick and Ohde (1993), also differentiates among all four places of fricatives articulation in English Duration Cues Fricative duration measures were used in previous research mainly to differentiate between sibilants and nonsibilants, and to assess the voicing of fricatives. One such study was conducted by Behrens and Blumstein (1988b) who recorded three native speakers of English producing each of the 4 English voiceless fricatives /f, T, s, S/ followed by one of the five vowels /i, e, a, o, u/. They found that sibilants /s, S/ were longer than nonsibilants /f, T/ with an average difference of 33 ms. lso, they found no significant differences between the duration of members of the same class. The vowel effect was found to be minimal and only among the nonsibilant fricatives. Similar results were obtained by Pirello, Blumstein, and Kurowski (1997). The researchers also found that alveolar fricatives were longer on average than labiodental fricatives in English. Jongman (1989) questioned the importance of frication noise duration as a cue for fricative identification. He found that listeners can identify fricatives based on a fraction of its frication noise duration. In a perception test, listeners only needed as little as 50-ms of the initial frication noise of a naturally produced fricative-vowel syllable to successfully classify fricatives. lthough cues like amplitude or spectral properties localized at the initial parts of the frication noise may have been used here, it is important to note that such results undermine the significance of an absolute duration value in classifying fricatives. Temporal features of speech can vary as a function of speaking rate. In fact, when frication noise duration was normalized by taking the ratio of fricative duration over word duration, Jongman et al. (2000) found a significant difference among all places of fricative articulation with the exception of the labiodental and interdental contrast.

27 14 Frication noise duration has also been used to assess the voicing distinction between fricatives of the same place of articulation. Cole and Cooper (1975) examined the role of frication noise duration on the perception of voicing in fricatives. They found that decreasing the length of frication noise of voiceless fricative in syllable-initial position resulted in a shift in their perception toward their voiced counterparts. They noted also that in syllable-final position, duration of the frication noise relative to that of the preceding vowel becomes the cue for fricative voicing (voiced fricatives being shorter than voiceless). Similar findings were also obtained by Manrique and Massone (1981) for Spanish fricatives /B, f, D, s, S, Z, x, G/ in three conditions: isolated, in CV syllables, and CVCV words. Noise duration was significantly shorter for voiced fricatives than for voiceless fricatives in all three conditions. However, of these fricatives, only /S, Z/ and /x, G/ are homorganic; while the other two pairs do not share the same place of articulation (Baum and Blumstein 1987). Therefore, the reported temporal differences in Manrique and Massone s study might have been due to factors other than fricative voicing since, as mentioned previously, durational differences existed between fricatives sharing the same voicing but belonging to different places of articulation (Behrens and Blumstein 1988b). Nevertheless, Baum and Blumstein s own experiments showed that syllable-initial voiceless English fricatives in citation forms are longer than their voiced counterparts. However, they noted considerable overlap in duration distributions of voiced and voiceless fricatives at all places studied. Using connected speech, Crystal and House (1988) also found that, on average, voiceless fricatives in word-initial position are longer than voiced fricatives. Like Baum and Blumstein s results, there was a considerable amount of overlap between the duration distributions of the voiced and voiceless fricatives in connected speech. gain, the use of duration per se as the sole cue for fricative voicing was questioned

28 15 by Jongman (1989) who found that identification of fricatives voicing was accurate (83%) even if only 20 ms of frication noise is used. However, Jongman et al. (2000) used a relative measure of duration to quantify its use as a cue for fricative voicing. Normalized fricative noise duration (defined as the ratio of fricative duration over that of the carrier word) significantly longer for voiceless than for voiced fricatives. They also found that such differences are more apparent in nonsibilant than in sibilant fricatives Spectral Cues In addition to amplitude and duration, spectral properties of the frication noise have been investigated to find cues that identify fricative place of articulation. mong the spectral properties previously studied are spectral peak location and spectral moments measurements Spectral peak location One of the early attempts to relate the fricative place of articulation to the frequency location of energy maximum in the frication noise was the study by Hughes and Halle (1956). In this study, gated 50 ms windows of the frication noise were used to produce spectra of English fricatives /f, v, s, z, S, Z/. n investigation of the fricative spectra revealed that for some speakers a strong energy component was located at the frequency region below 700 Hz for the spectrum of voiced fricatives. Such energy concentration was absent at the same region for voiceless fricatives. However, these findings were not consistent among all speakers. Based on this inconsistency, in addition to the similarities found between the spectra of homorganic voiced and voiceless fricatives above 1 khz, Hughes and Halle ruled out the use of spectral prominence as a basis for voicing distinction among fricatives. On the other hand, the distinction of place was found to be related, to a certain extent, to the location of the most prominent spectral peak. Hughes and Halle found that /f, v/ had a relatively flat spectrum below 10 khz, whereas

29 16 spectral prominence was observed for /S, Z/ at the region of 2-4 khz, and for /s, z/ at the region above 4 khz. lso, they found that the exact location of the peak for each fricative was lower for males and higher for females. Based on these observations, Hughes and Halle concluded that the size and shape of the resonance chamber in front of the fricative s point of constriction determine the place of energy maximum in frication noise spectra. Specifically, they reported that the length of the vocal tract from the point of constriction to the lips was inversely related to the frequency of the peak in the spectrum. Thus, the spectral peak increases as the point of articulation becomes closer to the lips. Such observations are consistent with predictions made by the the source-filter theory of speech production presented in section 2.2. Strevens (1960) also looked into the use of spectral prominence to differentiate between fricatives through examining the front (/F, f, T/), mid (/s, S, ç/) and back (/x, X, h/) voiceless fricatives as produced by subjects with professional training in phonetics. Based on average line spectra, Strevens found that the front fricatives were characterized by unpatterned low intensity and smooth spectra, the mid fricatives by high intensity with significant peaks on the spectra around 3.5 khz and the back fricatives by medium intensity and a marked formant like structure with peaks around 1.5 khz. The results reported above for front and mid fricatives were also shown to be perceptually valid (Heinz and Stevens 1961). Using a synthesized continuum of white noise with spectral peaks in ranges representative of those found in /S, ç, s, f, T/, Heinz and Stevens found that participants were consistently shifting the identification of the fricative from /S/ to /ç/ to /s/ to /f, T/ as the peak of the resonance frequency increased, with no distinction that could be made between /f, T/.

30 17 Similar properties were also found for fricatives in Spanish. In their study of Spanish fricatives, Manrique and Massone (1981) found that /s/, /f/ and /T/ have spectral peak values comparable to the English fricatives as reported by Hughes and Halle (1956). Furthermore, they reported finding that spectral energy in /x/ is concentrated in a low narrow frequency band continuous with the F2 of the following vowel and that /ç/ spectral frequency is concentrated at a low band continuous with F3 of the following vowel. Manrique and Massone (1981) also examined the identification of a subset of Spanish fricatives to see whether changes in spectral peak location would change the way fricatives are perceived by Spanish speakers. They synthesized 9 cascade stimuli of the middle 500 ms of each of a deliberately lengthened /f, s, S, x/ using a set of low- and high-pass filters so that only certain spectral zones were present for each stimuli. The unfiltered fricatives had recognition scores ranging from 95% for /f/ and /s/, to 100% for /S/ and /x/. For the filtered fricatives, they found that the spectral peak location carries the perceptual load for the identification of /s/, /S/, and /x/. However, the diffused spectrum of /f/ was believed to be the characterizing factor of its identifiability. Other studies of English fricatives confirmed that spectral peak location can classify sibilants from nonsibilants as a class, and only between sibilants. For example, Behrens and Blumstein (1988b) found that for English voiceless fricatives, major spectral peaks in ranges within khz were apparent for /s/ and within khz for /S/. On the other hand /f/ and /T/ appeared flat with a diffused spread of energy from khz with a good deal of variability in their spectral shape. The same pattern was also observed across age groups. Pentz et al. (1979), for example, compared the spectral properties of English fricatives (/f, v, s, z, S, Z/) produced by preadolescent children to that reported for adults. s reported for adults elsewhere, they found the same pattern of energy localization and constriction point. However, the values obtained from children in their study

31 18 were higher than those obtained for male and female adult speakers in the studies mentioned above. This difference was attributed in large part to the differences in vocal tract lengths. Male adult speakers have the longest vocal tract and the lowest vocal tract resonance, while children have the shortest vocal tract and the highest vocal tract resonance; female adult speakers fall between the two groups. In another study, Nissen (2003) investigated, among other metrics, the spectral peak location of voiceless English obstruents as produced by male and female speakers of four different age groups. For the fricatives in the study, he found that the spectral peak decreased as a function of increased speaker age (Nissen 2003, p. 139). Beside being age and gender dependent, spectral peak location has also been found to be vowel dependent (Mann and Repp 1980; Soli 1981) and highly variable for speakers with neuromotor dysfunction (Chen and Steven 2001) due to their lack of control over articulatory muscles. However, in contrast to all the studies mentioned above, Jongman et al. (2000) found that across all (male and female) speakers and vowel contexts, all four places of fricative articulation in English were significantly different from each other in terms of spectral peak location. Further, they found spectral peak location to reliably differentiate between /T/ and /D/ and between /f/ and /v/. The researchers justified the use of the larger analysis window they adopted in their study, as compared to other studies, as a way to obtain better resolution in the frequency domain at the expense of temporal domain resolution. They argue that such a compromise is advantageous due to the stationary nature of frication noise. In summary, spectral peak location for the fricatives increases as the constriction becomes closer to the open end of the vocal tract. lso, spectral peak for back fricatives shows a formant-like structure similar to the following vowel. Both of these generalizations can be accounted for by the source-filter theory of speech production. Fricatives are characterized by turbulent airflow through a

32 19 narrow constriction in the oral cavity, with the portion of the vocal tract in the front of the constriction effectively becoming the resonating chamber. For long and narrow constrictions, like fricatives, the acoustic theory of speech production predicts that the only present resonance components in the spectrum are those related to the area in front of the constriction due to lack of acoustic coupling from the cavity behind the constriction (Heinz and Stevens 1961). The size of the resonating cavity, therefore, can be inversely correlated with the frequency of the most prominent peak in the spectrum (Hughes and Halle 1956). s a result of this correlation, fricatives produced at or behind the alveolar region are characterized by a well-defined spectrum with peaks around khz for /S, Z/ and at khz for /s, z/. However, due to the very small area in front of the constriction, fricatives produced at the labial or labiodental area are characterized with a flat spectrum and a diffused spread of energy between 1.5 and 8.5 khz. Since nonsibilant production creates a cavity in close proximity to the open end of the vocal tract, different degrees of lip rounding (Shadle, Mair, and Carter 1996), and the additional turbulence produced by the air stream hitting the teeth (Strevens 1960; Behrens and Blumstein 1988a) will introduce a great amount of variability in the location of the energy concentration. On the other hand, sibilants usually have a clearly defined spectral peak location. However, for speakers with limited precision over the placement of the constriction (Chen and Steven 2001), such variability also exists for sibilants Spectral moments Spectral moments analysis is another metric that has been used for fricative identification. Unlike spectral peak location analysis, this statistical approach captures both local (mean frequency and variance) and global (skewness and kurtosis) aspects of fricative spectra. Spectral mean refers to the average energy concentration and variance to its range. Skewness, on the other hand, is a measure

33 20 of spectral tilt that indicate the frequency of the most energy concentration. Skewness with a positive value indicates a negative spectral tilt with energy concentration at the lower frequencies, while negative skewness is an indication of positive tilt with energy concentration at higher frequencies (Jongman et al. 2000). Kurtosis is an indicator of the distribution s peakedness. One of the early applications of spectral moments to classify speech sounds was the study by Forrest et al. (1988) on English obstruents. For the fricatives in that study, Forrest et al. generated a series of Fast Fourier Transforms (FFT) using a 20 ms analysis window with a step-size of 10 ms that started at the obstruent onset through three pitch periods into the vowel. The FFT-generated spectra were then treated as a random probability distribution from which the first four moments (mean, variance, skewness, and kurtosis) were calculated. The spectral moments obtained from both linear and Bark scales were entered into a discriminant function analysis in an attempt to classify voiceless fricatives according to their place of articulation. Classification scores, on both scales, were good for the sibilants /s/ and /S/ with 85% and 95% respectively. The nonsibilants, on the other hand, were not as accurately classified using any moment on either of the two scales (58% for /T/ and 75% for /f/). Subsequent implementations of the spectral moment analysis tried to extend or replicate Forrest et al. approach with some modifications. The study by Tomiak (1990) of English voiceless fricatives, for example, used a different analysis window (100 ms) at different locations of the English voiceless frication noise. Like in previous research, spectral moments were successful in classifying sibilants and /h/ data. In the case of nonsibilants, it was found that the most useful spectral information is contained in the transition portion of the frication. dditionally, in contrast to Forrest et al., Tomiak found an advantage for the linearly derived moment profiles over the Bark-scaled ones.

34 21 Spectral moments were also used by Shadle et al. (1996) to classify voiced and voiceless English fricatives. The study involved spectral moments measured from discrete Fourier transform (DFT) analyses performed at different locations within the frication noise and at different frequency ranges. They found that spectral moments provided some information about fricative production but did not discriminate reliably between their different places of articulation. Furthermore, their results indicated that spectral moments are sensitive to the frequency range of the analysis. However, the moments were not sensitive to the analysis position within the fricative. Similar results were also obtained for children (Nittrouer, Stiddert-Kennedy, and McGowan 1989; Nittrouer 1995). The use of spectral moments as a tool to distinguish between /s/ and /S/ was also extended to atypical speech and found to be reliable. Tjaden and Turner (1997), for example, compared spectral moments obtained from speakers with amyotrophic lateral sclerosis (LS) and healthy controls matched for age and gender and found that the first moment was significantly lower for the LS group. Tjaden and Turner suggested that the low means values found among SL speakers can be attributed to difficulties they face at making the appropriate degree of constriction required to produce frication, or to a weaker subglottal sound source due to weak respiratory muscles that are common with SL speakers. The studies mentioned so far demonstrate the ability of spectral moments to distinguish sibilants from nonsibilants as a class and that they can reliably distinguish only among sibilants. However, contrary to the studies mentioned above, Jongman et al. (2000) found that spectral moments were successful in capturing the differences between all four places of fricative articulation in English. Jongman et al. study, however, differs from other studies in that it calculated moments from a 40 ms FFT analysis window placed at four different places in the frication noise (onset, mid, end, and transition into vowel) and that it uses a

35 22 larger and more representative number of speakers and tokens (2880 tokens from 20 speakers) as compared to a smaller population in other studies. cross moments and window locations, variance and skewness at onset and transition were found to be the most robust classifiers of all four places. lso, on average, variance was shown to effectively distinguish between voiced and voiceless fricatives with the former having greater variance Formant Transition Cues Second formant at transition Early research on formant transition focused on perceptual usefulness of such information in classifying speech sounds. For example, Harris (1958) recorded the English fricatives /f, v, T, D, s, z, S, Z/ followed by one of each of the vowels /i, e, o, u/. Then she spliced and recombined vocalic and frication partitions of all CV combinations. Listeners correctly identified sibilant fricatives regardless of the source of the cross-spliced vocalic part. Frication noise alone was sufficient for correct identification of sibilant fricatives. On the other hand, among nonsibilant fricatives, a correct identification as /f, v/ occurred only when the vocalic part was matching (i.e. coming from a /f, v/ syllable), and as /T, D/ with mismatching vocalic parts. Based on these identification patterns, Harris suggested that the perception of fricatives occurs at two consecutive stages. In the first stage, cues from frication noise alone determine whether the fricative is a sibilant or nonsibilant. If sibilant is the determined class, then cues from the frication noise alone will differentiate among the sibilant fricatives. However, if the class is determined to be nonsibilant at the first stage, then the formant transition information is used for the within-class classification. s was the case with crosssplicing methods previously mentioned (section ), this method also does not eliminate the possibility of dynamic coarticulatory information from being colored into the precut vowel and/or fricative. It is not clear, therefore, that the results

36 23 obtained can be attributed solely to the mismatching vocalic part of the crossspliced signal. To overcome this problem, Heinz and Stevens (1961) synthesized stimuli consisting of white noise of varying frequency peaks, similar to peaks found in English fricatives, followed by four synthetic formant transition values. Listeners were instructed to label these stimuli as one of the four voiceless English fricatives /f, T, s, S/. Based on identification scores, the researchers concluded that /f/ is distinguished from /T/ on the basis of the F2 transition in the following vowel. There was no apparent effect of formant transition on the distinction between /s/ and /S/. These findings support those of Harris (1958), while using more controlled stimuli. The role of formant transition, however, was not found to be as crucial in other studies. LaRiviere, Winitz, and Herriman (1975) used the fricative noise in its entirety in a perceptual test and obtained high recognition scores for /s, S/, lower scores for /f/ and poor scores for /T/. More importantly, when vocalic information was included for the /f, T/ tokens, no significant increase in their recognition was obtained. Other studies (Manrique and Massone 1981; Jongman 1989) also found similar results using different methods. The perceptual experiments thus far mentioned used a forced-choice technique that might have biased participants responses. For that reason Manrique and Massone (1981) used a tape splicing paradigm to study the effect of formant transition on the perception of Spanish fricatives by Spanish listeners. They constructed their stimuli by splicing CV syllables into their respective frication and vowel parts. Listeners were asked to choose the fricative when presented with the frication noise alone and to freely guess the sound that preceded the vowel when presented with the vocalic part. In the latter case, most token were judged (85% of the responses) to have been preceded with a stop sharing the same place of articulation as the spliced fricative. Spanish fricatives with no stops sharing

37 24 the same place of articulation were perceived as /t/, with the exception of /f/ which was perceived as /p/ 50% of the times. The same listeners were able to identify the fricative accurately from only the frication part in all cases except for /x/ and /G/. However, another study found that formant transition was not crucial for correct identification of fricatives (Jongman 1989). Based only on the frication noise part of fricative-vowel syllables, Jongman (1989) achieved correct (92%) fricative identification in a perceptual experiment of English fricatives. More importantly, there was no significant increase in identification accuracy when the entire fricative-vowel syllable was presented. s with results obtained from synthetic speech, measures of formant transition from naturally produced fricatives are also conflicting. Wilde and Huang (1991), for example, measured the F2 at the vowel onset for fricatives of only one male speaker and found that the F2 value did not differentiate systematically between /f/ and /T/. However, in another study, Wilde (1993) found that transitional information as measured by F2 value at the fricative-vowel boundary can be used to identify fricative place of articulation. The measurement she obtained from two speakers showed that as the place of constriction moves back in the vocal tract, the value of F2 systematically increases and its range becomes smaller Locus equations Locus equations provide a method to quantify the role of formant transition in the identification of fricative place of articulation by relating second formant frequency at vowel onset (F2 onset ) to that at vowel midpoint (F2 vowel ). Locus equations are straight line regression fits to data points formed by plotting onsets of F2 transitions along the y axis and their corresponding vowel nuclei F2 along the x axis in order to obtain the value of the slope and y-intercept. This metric has been used primarily to classify English stops (Lindblom 1963; Sussman et al. 1991). It was only recently that this measure was applied to fricatives. Fowler

38 25 (1994) investigated the use of locus equations as cues to place of articulation across different manners of articulation including the fricatives /v, D, z, Z/ as spoken by five males and five females speakers of English. In this study, Fowler found that locus equations (in terms of slope and y-intercept) of a homorganic stop and fricative were significantly different, while those of a stop and a fricative of different place of articulation were significantly similar. Nevertheless, locus equations were able to differentiate between members that share the same manner of articulation. Slopes for fricatives /v, D, z, Z/, for example, were significantly different (slopes of 0.73, 0.50, 0.42, and 0.34 respectively). In another study, Sussman (1994) investigated the use of locus equations to classify consonants across manners of articulation (approximants, fricatives, and nasals). In contrast to Fowler (1994), he found that fricatives were not distinguishable based on the slope of their locus equations. Only /v/ had a distinctive slope. Results of other studies of English fricatives were similar to those of Sussman (1994). For example, in their large-scale study of English fricatives, Jongman et al. (2000) calculated the slope and y-intercept for all English fricatives in six vowel environments. Specifically, Jongman and colleagues measured F2 onset and F2 vowel from a 23.3 ms full Hamming window placed at the onset and midpoint of the vowel respectively. This was the same method used by the previously mentioned studies. Similar to Sussman (1994), Jongman et al. (2000) found that only the slope value for /f, v/ was significantly different and that the y-intercept were distinct only for /f, v/ and /S, Z/. Locus equations are particularly of interest here since they have been shown to work across languages (Sussman, Hoemeke, and hmed 1993), gender (Sussman et al. 1991), speaking style (Krull 1989), and speaking rate (Sussman, Fruchter, Hilbert, and Sirosh 1998).

39 Studies of rabic Fricatives The use of acoustic cues to distinguish between the different fricatives in rabic has been underinvestigated in the literature. Furthermore, the very few studies dealing with acoustic characteristic of rabic fricatives (see below) have been predominantly concerned with a single acoustic feature and not with the way multiple cues can be integrated in order to distinguish among the fricative place of articulation. While some of the cues mentioned above seem to distinguish with a relatively good accuracy between English fricatives, the same cues when used to classify rabic fricatives need to take into account acoustic characteristics particular to rabic. For example, unlike English, rabic utilizes durational differences of both vowels and consonants for phonemic distinctions. It is of interest, therefore, to see how such durational property would affect voicing and place classification of rabic fricatives. nother interesting feature of rabic is the existence of co-articulated (pharyngealized) fricatives that are phonemically distinct from their plain counterparts. Due to their double articulation mechanism, it is expected in our study that pharyngealized fricatives will have two patterns of peaks emerging at the middle and near the end of frication. Therefore, it seems necessary to use a second analysis window at the end of frication noise such that its right shoulder is aligned with the end of frication noise. dditionally, the two window locations are suggested because studies of spectral peak location have demonstrated that high frequency peaks are more likely to emerge at the middle and end of frication noise (Behrens and Blumstein 1988b). lso, the frequency of the most prominent peak for the pharyngealized fricatives is expected to be lower than their plain counterparts because of acoustic coupling resulting from co-articulation. Spectral moments seem to be another promising technique in classifying rabic fricatives if the proper size and location of the analysis windows are used. In fact, in a study of fricatives in Cairo rabic, Norlin (1983) found that /s,

40 27 s Q, z, z Q / are characterized by a sharp peak in higher frequencies, and that the peak of /s Q, z Q / are broader than /s, z/. Norlin used Center of Gravity (COG) and dispersion as ways of quantifying the location of the peak and the spread of the dispersion respectively. Therefore, it seems that a combination of spectral mean and variance along with skewness measures would differentiate between pharyngealized and plain fricatives. The use of formant transition information was investigated in the literature in relation to the fricatives articulated at the back of the oral cavity. For example, El-Halees (1985) found that the F1 value at the transition differentiates between uvular and pharyngeal fricatives with the former being lower. lso, he found that listeners can differentiate between the two classes based only on this single feature. The perceptual salience of F1 onset was also demonstrated by lwan (1989), who used synthetic speech to test the discrimination between voiced pharyngeal fricative /Q/ and voiced uvular fricative /X/. She found that the higher F1 onset for the pharyngeal was essential to make the distinction, while F2 onset was not. The relation between back articulation and high F1 was also attested for vowels following such sounds. Zawaydeh (1997) found that F1 at the middle of the vowel was raised when preceded by one of the gutturals /s Q, è/ or the glottal /h/ as compared to non-gutturals. In addition to first and second formant at transition, locus equations were also used as a classification metric for rabic. The first attempt was part of a cross-linguistic study of locus equations as a cue for stops place of articulation. Sussman et al. (1993) recorded the voiced stops /b, d, d Q, g/ as produced by three speakers of the Cairene dialect of rabic. They found that both slope and y-intercept for almost all comparisons were significantly different except for the slope of /d/ and /d Q /, and the y-intercept for /b/ and /g/. The second study was conducted by Yeou (1997) who elicited both stops and fricatives from nine

41 28 Moroccan subjects. Yeou found that y-intercept and slope distinguished between most fricative comparisons. However, neither slope nor y-intercept distinguished /S/ from /è/ or /f/ from /X/. More importantly, locus equation slopes were able to group pharyngealized (/D Q, s Q /) together as a distinct group differing from their non-pharyngealized counterparts and other fricatives with distinctly low y-intercepts and flat slopes. Yeou argued that unlike their plain counterparts, pharyngealized fricatives resist the articulatory effects of the following vowel due to their double articulation. Instead they induce their coarticulatory effect on the following vowel by raising its F1 and lowering its F2. This change in F2, as compared to plain fricatives, causes the slope to be flatter and the intercept to be lower. To summarize, several acoustic cues related to spectral, temporal and amplitude information found in the speech signal were used in different languages to classify fricatives into their places of articulation. Such cues, alone and collectively, served to distinguish between different places/classes of fricatives in English. Howeve, the use of these cues to classify rabic fricatives has not received much attention. In our study we attempt to examine how each of the spectral, temporal and amplitude characteristics mentioned in Sections (2.3) would serve alone and collectively to distinguish between place of articulation of rabic fricatives. dditionally, of particular importance to our study is to see if the acoustic cues found to be effective in fricative classification in other languages will be affected by the vowel length differences present in rabic; and if such cues would distinguish between plain and pharyngealized fricatives. In the following chapter, we will discuss how such cues are investigated and the modifications implemented in the measurements techniques if any.

42 CHPTER 3 METHODOLOGY Several spectral, amplitude, and temporal measurements have been used in previous research to describe the acoustic cues that characterize fricatives in different languages. The current study investigated rabic fricatives to find such acoustic cues. This chapter describes the way in which the speech samples were elicited, recorded and analyzed. For most of the acoustic analyses, this research followed the procedures commonly used to study fricatives in English as illustrated in Jongman et al. (2000). Certain modifications were applied to further investigate characteristics particular to rabic. ll coding and data analysis was carried out using the PRT software (Boersma and Weenink 2004) and a set of scripts developed at the phonetics lab of the University of Florida by the author Participants 3.1 Data Collection group of eight adult male speakers of Modern Standard rabic (MS) were recruited to participate in our study from the general undergraduate student population of King Saud University 1. The mean age of participants was 20 years. They did not have any history of hearing or speaking impairments, and all had a very limited experience with English as a second language. Participants were given class credit by their instructors for participating in the study. 1 King Saud University, Riyadh, Saudi rabia 29

43 Materials There is a gap that exists in rabic between MS and its vernacular varieties. rabic has been known as a traditional example of diglossia in which two varieties of the language are used to fulfill different communicative functions (Ferguson 1959). lthough participants were all fluent speakers of MS, additional care was taken in eliciting speech material in order to ensure that the participants would stay within the target MS register. Therefore fricatives were elicited using screen prompted speech in conjunction with prerecorded audio prompts. trained phonetician, who is also a fluent speaker of MS, produced CVC syllables where the initial consonant was a MS fricative /f, T, D, D Q, s, s Q, z, S, X, K, è, Q, h/ followed by each of the six vowels /i, i:, a, a:, u, u:/. The final consonant was always /t/. Each resulting word was repeated three times to yield a total of 234 audio prompts (13 fricatives 6 vowels 3 repetitions). The recorded prompts were then edited to be of equal length ( 1 second) by adding silence to the end if needed. The written prompts were constructed using fully vowelled rabic orthography on a white background. The participants were instructed to repeat the word presented in the carrier phrase qul marratajn (say twice); with the audio prompt functioning only as a reference. The prompts were presented randomly in blocks of 39 words with breaks between blocks. Before the actual recording of any participant, a practice session with 10 words presented in two blocks was conducted to familiarize the participants with the task Recording The recording was carried out using the facilities of the Computer & Electronics Research Institute at KCST 2. Two adjacent sound-attenuated booths with a monitoring window between them hosted the data collection process. 2 King bdulaziz City for Science and Technology, Riyadh, Saudi rabia.

44 31 In one booth a PC computer running Microsoft PowerPoint was used to present the synchronized audio-written production prompts via an LCD screen affixed to the outside of the monitoring window of the other booth. The text was shown on the LCD screen while the synchronized audio prompt was fed through headphones (Sennheiser Noisegard mobile HDC 451). Kay Elemetrics CSL (Computer Speech Lab) model 4300B which was connected to another PC computer was used for in-line recording of the participants utterances. It should be pointed out that anti-aliasing is carried out automatically during data capture through CSL external module. ll recordings were done at khz sampling rate and 16 bit quantization. The participant s production of the word in the carrier phrase was captured using a low-impedance, unidirectional head-worn dynamic microphone (SHURE SM10) positioned about 20 mm to the left of the participants mouth in order to prevent direct air flow turbulence from impinging on the microphone. Each word lasted 4 seconds on the screen and then the following word was shown. In case a participant did not produce the word in the allocated time or a mispronunciation occurred, the recording was stopped by the author and that particular word was presented again. Each block was saved to a separate sound file for easy manipulation. The resulting sound files were then transfered into PRT for segmentation and further analyses. 3.2 Data nalysis Segmentation of Speech Both a wide-band spectrogram and a waveform display were used in the segmentation of the recorded material into the monosyllabic words containing the test fricatives. For each token, four points were identified on the waveforms: the beginning of frication, the offset of fricative/beginning of the vowel, the end of the vowel, and the end of word. For all these points the nearest zero-crossing

45 32 was always used. Fricative onset was taken to be the point in time at which highfrequency energy appeared on the spectrogram and/or a significant increase in zero-crossings rate occurred. The offset of the voiceless fricative was taken to be the point of minimum intensity preceding the periodicity of the vowel. For the voiced fricatives, the offset was taken to be the zero-crossing of the pulse preceding the earliest pitch period exhibiting a change in the waveform from that seen throughout the initial frication (Jongman et al. 2000). The vowel offset was taken to be the end of periodicity while the end of the segmented token was taken to be the onset of stop burst release. Figure 3 1 shows an example of these points. The time indices of the segmentation points were written to a PRT TextGrid file. Such files make it easier to handle the signal independently from the segmentation data and labels. Fricative onset Fricative offset Vowel offset Stop release Figure 3 1. Example of Segmentation

46 33 The only exception to the above mentioned general rules was with the voiced pharyngeal fricative /Q/, where it was difficult to visually localize the fricativevowel boundary. Pharyngeal fricative /Q/ is known to have a formant-like structure continuous with that of the following vowel, with the lowest frequency of the fricative matches that of the second formant of the following vowel (Johnson 1997). Therefore, the frication offset for /Q/ was taken to be the point at which an upwards intensity-shift occurred with reference to the intensity of the fricative onset. Such point indicates the shift from low intensity founds in the frication noise towards the higher intensity of the vocalic part. Figure 3 2 shows an example of the segmentation of /Q/. Due to the absence of voicing during frication, such modification in segmentation criteria was not necessary for either /è/ nor /h/. Fricative offset Fricative onset Vowel offset Stop release Figure 3 2. Segmentation of /Q/. The dotted line shows the intensity level.

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES ISCA Archive ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES Allard Jongman 1, Yue Wang 2, and Joan Sereno 1 1 Linguistics Department, University of Kansas, Lawrence, KS 66045 U.S.A. 2 Department

More information

Overview. Acoustics of Speech and Hearing. Source-Filter Model. Source-Filter Model. Turbulence Take 2. Turbulence

Overview. Acoustics of Speech and Hearing. Source-Filter Model. Source-Filter Model. Turbulence Take 2. Turbulence Overview Acoustics of Speech and Hearing Lecture 2-4 Fricatives Source-filter model reminder Sources of turbulence Shaping of source spectrum by vocal tract Acoustic-phonetic characteristics of English

More information

Speech Cue Weighting in Fricative Consonant Perception in Hearing Impaired Children

Speech Cue Weighting in Fricative Consonant Perception in Hearing Impaired Children University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange University of Tennessee Honors Thesis Projects University of Tennessee Honors Program 5-2014 Speech Cue Weighting in Fricative

More information

Speech (Sound) Processing

Speech (Sound) Processing 7 Speech (Sound) Processing Acoustic Human communication is achieved when thought is transformed through language into speech. The sounds of speech are initiated by activity in the central nervous system,

More information

It is important to understand as to how do we hear sounds. There is air all around us. The air carries the sound waves but it is below 20Hz that our

It is important to understand as to how do we hear sounds. There is air all around us. The air carries the sound waves but it is below 20Hz that our Phonetics. Phonetics: it is a branch of linguistics that deals with explaining the articulatory, auditory and acoustic properties of linguistic sounds of human languages. It is important to understand

More information

Speech Spectra and Spectrograms

Speech Spectra and Spectrograms ACOUSTICS TOPICS ACOUSTICS SOFTWARE SPH301 SLP801 RESOURCE INDEX HELP PAGES Back to Main "Speech Spectra and Spectrograms" Page Speech Spectra and Spectrograms Robert Mannell 6. Some consonant spectra

More information

Acoustic and Spectral Characteristics of Young Children's Fricative Productions: A Developmental Perspective

Acoustic and Spectral Characteristics of Young Children's Fricative Productions: A Developmental Perspective Brigham Young University BYU ScholarsArchive All Faculty Publications 2005-10-01 Acoustic and Spectral Characteristics of Young Children's Fricative Productions: A Developmental Perspective Shawn L. Nissen

More information

Fricative spectra vary in complex ways that depend. Toward Improved Spectral Measures of /s/: Results From Adolescents.

Fricative spectra vary in complex ways that depend. Toward Improved Spectral Measures of /s/: Results From Adolescents. JSLHR Article Toward Improved Spectral Measures of /s/: Results From Adolescents Laura L. Koenig, a,b Christine H. Shadle, a Jonathan L. Preston, a,c and Christine R. Mooshammer a Purpose: This article

More information

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED Francisco J. Fraga, Alan M. Marotta National Institute of Telecommunications, Santa Rita do Sapucaí - MG, Brazil Abstract A considerable

More information

Temporal Location of Perceptual Cues for Cantonese Tone Identification

Temporal Location of Perceptual Cues for Cantonese Tone Identification Temporal Location of Perceptual Cues for Cantonese Tone Identification Zoe Wai-Man Lam, Kathleen Currie Hall and Douglas Pulleyblank Department of Linguistics University of British Columbia 1 Outline of

More information

LINGUISTICS 221 LECTURE #3 Introduction to Phonetics and Phonology THE BASIC SOUNDS OF ENGLISH

LINGUISTICS 221 LECTURE #3 Introduction to Phonetics and Phonology THE BASIC SOUNDS OF ENGLISH LINGUISTICS 221 LECTURE #3 Introduction to Phonetics and Phonology 1. STOPS THE BASIC SOUNDS OF ENGLISH A stop consonant is produced with a complete closure of airflow in the vocal tract; the air pressure

More information

ACOUSTIC MOMENTS DATA

ACOUSTIC MOMENTS DATA ACOUSTIC MOMENTS DATA FOR PALATALIZED AND DENTALIZED SIBILANT PRODUCTIONS FROM SPEECH DELAYED CHILDREN WITH AND WITHOUT HISTORIES OF OTITIS MEDIA WITH EFFUSION Phonology Project Technical Report No. 12

More information

Perception of clear fricatives by normal-hearing and simulated hearing-impaired listeners

Perception of clear fricatives by normal-hearing and simulated hearing-impaired listeners Perception of clear fricatives by normal-hearing and simulated hearing-impaired listeners Kazumi Maniwa a and Allard Jongman Department of Linguistics, The University of Kansas, Lawrence, Kansas 66044

More information

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants Context Effect on Segmental and Supresegmental Cues Preceding context has been found to affect phoneme recognition Stop consonant recognition (Mann, 1980) A continuum from /da/ to /ga/ was preceded by

More information

group by pitch: similar frequencies tend to be grouped together - attributed to a common source.

group by pitch: similar frequencies tend to be grouped together - attributed to a common source. Pattern perception Section 1 - Auditory scene analysis Auditory grouping: the sound wave hitting out ears is often pretty complex, and contains sounds from multiple sources. How do we group sounds together

More information

Automatic Judgment System for Chinese Retroflex and Dental Affricates Pronounced by Japanese Students

Automatic Judgment System for Chinese Retroflex and Dental Affricates Pronounced by Japanese Students Automatic Judgment System for Chinese Retroflex and Dental Affricates Pronounced by Japanese Students Akemi Hoshino and Akio Yasuda Abstract Chinese retroflex aspirates are generally difficult for Japanese

More information

o Spectrogram: Laterals have weak formants around 250, 1200, and 2400 Hz.

o Spectrogram: Laterals have weak formants around 250, 1200, and 2400 Hz. Ch. 10 Places of articulation 1) Primary places of articulation a) Labials: made with one or both lips i) Bilabial: two lips. Bilabial stops and nasals are common, while fricatives are uncommon. ii) Labiodental:

More information

THE IMPACT OF SPECTRALLY ASYNCHRONOUS DELAY ON THE INTELLIGIBILITY OF CONVERSATIONAL SPEECH. Amanda Judith Ortmann

THE IMPACT OF SPECTRALLY ASYNCHRONOUS DELAY ON THE INTELLIGIBILITY OF CONVERSATIONAL SPEECH. Amanda Judith Ortmann THE IMPACT OF SPECTRALLY ASYNCHRONOUS DELAY ON THE INTELLIGIBILITY OF CONVERSATIONAL SPEECH by Amanda Judith Ortmann B.S. in Mathematics and Communications, Missouri Baptist University, 2001 M.S. in Speech

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception Long-term spectrum of speech HCS 7367 Speech Perception Connected speech Absolute threshold Males Dr. Peter Assmann Fall 212 Females Long-term spectrum of speech Vowels Males Females 2) Absolute threshold

More information

Changes in the Role of Intensity as a Cue for Fricative Categorisation

Changes in the Role of Intensity as a Cue for Fricative Categorisation INTERSPEECH 2013 Changes in the Role of Intensity as a Cue for Fricative Categorisation Odette Scharenborg 1, Esther Janse 1,2 1 Centre for Language Studies and Donders Institute for Brain, Cognition and

More information

Categorical Perception

Categorical Perception Categorical Perception Discrimination for some speech contrasts is poor within phonetic categories and good between categories. Unusual, not found for most perceptual contrasts. Influenced by task, expectations,

More information

An MRI study of vocalic context effects and lip rounding in the production of English sibilants

An MRI study of vocalic context effects and lip rounding in the production of English sibilants PAGE 307 An MRI study of vocalic context effects and lip rounding in the production of English sibilants Michael Proctor, Christine Shadle and Khalil Iskarous Haskins Laboratories 300 George St, New Haven

More information

Consonant Perception test

Consonant Perception test Consonant Perception test Introduction The Vowel-Consonant-Vowel (VCV) test is used in clinics to evaluate how well a listener can recognize consonants under different conditions (e.g. with and without

More information

The effects of intonation on acoustic properties of fricatives

The effects of intonation on acoustic properties of fricatives The effects of intonation on acoustic properties of fricatives Marzena Żygis 1, Daniel Pape 2, Luis M.T. Jesus 2,3, Marek Jaskuła 4 1 Centre for General Linguistics (ZAS) & Humboldt University, Berlin,

More information

Formant-cavity affiliation in sibilant fricatives. Martine Toda, Shinji Maeda and Kiyoshi Honda

Formant-cavity affiliation in sibilant fricatives. Martine Toda, Shinji Maeda and Kiyoshi Honda Formant-cavity affiliation in sibilant fricatives Martine Toda, Shinji Maeda and Kiyoshi Honda Introduction Sibilant fricatives form a subgroup of fricatives that outputs high intensity noise. This high

More information

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32 SLHS 1301 The Physics and Biology of Spoken Language Practice Exam 2 Chapter 9 1. In analog-to-digital conversion, quantization of the signal means that a) small differences in signal amplitude over time

More information

Speech Generation and Perception

Speech Generation and Perception Speech Generation and Perception 1 Speech Generation and Perception : The study of the anatomy of the organs of speech is required as a background for articulatory and acoustic phonetics. An understanding

More information

A psychoacoustic method for studying the necessary and sufficient perceptual cues of American English fricative consonants in noise

A psychoacoustic method for studying the necessary and sufficient perceptual cues of American English fricative consonants in noise A psychoacoustic method for studying the necessary and sufficient perceptual cues of American English fricative consonants in noise Feipeng Li a) Center for Language and Speech Processing, Johns Hopkins

More information

Experimental Analysis of Voicing Contrast in Igbo Linda Chinelo Nkamigbo* DOI:

Experimental Analysis of Voicing Contrast in Igbo Linda Chinelo Nkamigbo* DOI: Experimental Analysis of Voicing Contrast in Igbo Linda Chinelo Nkamigbo* DOI: http://dx.doi.org/1.4314/ujah.v12i2.9 Abstract This study presents experimental evidence to support the twoway voicing contrast

More information

Effect of Consonant Duration Modifications on Speech Perception in Noise-II

Effect of Consonant Duration Modifications on Speech Perception in Noise-II International Journal of Electronics Engineering, 2(1), 2010, pp. 75-81 Effect of Consonant Duration Modifications on Speech Perception in Noise-II NH Shobha 1, TG Thomas 2 & K Subbarao 3 1 Research Scholar,

More information

Place and Manner of Articulation Sounds in English. Dr. Bushra Ni ma

Place and Manner of Articulation Sounds in English. Dr. Bushra Ni ma Place and Manner of Articulation Sounds in English Dr. Bushra Ni ma Organs of Speech Respiratory System Phonatory System Articulatory System Lungs Muscles of the chest Trachea Larynx Pharynx Lips Teeth

More information

ACOUSTIC ANALYSIS AND PERCEPTION OF CANTONESE VOWELS PRODUCED BY PROFOUNDLY HEARING IMPAIRED ADOLESCENTS

ACOUSTIC ANALYSIS AND PERCEPTION OF CANTONESE VOWELS PRODUCED BY PROFOUNDLY HEARING IMPAIRED ADOLESCENTS ACOUSTIC ANALYSIS AND PERCEPTION OF CANTONESE VOWELS PRODUCED BY PROFOUNDLY HEARING IMPAIRED ADOLESCENTS Edward Khouw, & Valter Ciocca Dept. of Speech and Hearing Sciences, The University of Hong Kong

More information

11 Music and Speech Perception

11 Music and Speech Perception 11 Music and Speech Perception Properties of sound Sound has three basic dimensions: Frequency (pitch) Intensity (loudness) Time (length) Properties of sound The frequency of a sound wave, measured in

More information

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds Phonak Insight April 2016 SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds Phonak led the way in modern frequency lowering technology with the introduction

More information

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3 PRESERVING SPECTRAL CONTRAST IN AMPLITUDE COMPRESSION FOR HEARING AIDS Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3 1 University of Malaga, Campus de Teatinos-Complejo Tecnol

More information

Best Practice Protocols

Best Practice Protocols Best Practice Protocols SoundRecover for children What is SoundRecover? SoundRecover (non-linear frequency compression) seeks to give greater audibility of high-frequency everyday sounds by compressing

More information

Limited available evidence suggests that the perceptual salience of

Limited available evidence suggests that the perceptual salience of Spectral Tilt Change in Stop Consonant Perception by Listeners With Hearing Impairment Joshua M. Alexander Keith R. Kluender University of Wisconsin, Madison Purpose: To evaluate how perceptual importance

More information

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception ISCA Archive VOQUAL'03, Geneva, August 27-29, 2003 Jitter, Shimmer, and Noise in Pathological Voice Quality Perception Jody Kreiman and Bruce R. Gerratt Division of Head and Neck Surgery, School of Medicine

More information

Identification and classification of fricatives in speech using zero time windowing method

Identification and classification of fricatives in speech using zero time windowing method Interspeech 218 2-6 September 218, Hyderabad Identification and classification of fricatives in speech using zero time windowing method RaviShankar Prasad and B. Yegnanarayana Speech Processing Laboratory,

More information

An Experimental Acoustic Study of Dental and Interdental Nonsibilant Fricatives in the Speech of a Single Speaker * Mark J. Jones

An Experimental Acoustic Study of Dental and Interdental Nonsibilant Fricatives in the Speech of a Single Speaker * Mark J. Jones This document is materially identical to the paper of the same name published in Cambridge Occasional Papers in Linguistics (COPiL) 2, 2005: 109-121, but differs in formatting and page numbering. An Experimental

More information

WIDEXPRESS. no.30. Background

WIDEXPRESS. no.30. Background WIDEXPRESS no. january 12 By Marie Sonne Kristensen Petri Korhonen Using the WidexLink technology to improve speech perception Background For most hearing aid users, the primary motivation for using hearing

More information

Although considerable work has been conducted on the speech

Although considerable work has been conducted on the speech Influence of Hearing Loss on the Perceptual Strategies of Children and Adults Andrea L. Pittman Patricia G. Stelmachowicz Dawna E. Lewis Brenda M. Hoover Boys Town National Research Hospital Omaha, NE

More information

Use of Auditory Techniques Checklists As Formative Tools: from Practicum to Student Teaching

Use of Auditory Techniques Checklists As Formative Tools: from Practicum to Student Teaching Use of Auditory Techniques Checklists As Formative Tools: from Practicum to Student Teaching Marietta M. Paterson, Ed. D. Program Coordinator & Associate Professor University of Hartford ACE-DHH 2011 Preparation

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception Babies 'cry in mother's tongue' HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Babies' cries imitate their mother tongue as early as three days old German researchers say babies begin to pick up

More information

LINGUISTICS 221 LECTURE #6 Introduction to Phonetics and Phonology. Consonants (continued)

LINGUISTICS 221 LECTURE #6 Introduction to Phonetics and Phonology. Consonants (continued) LINGUISTICS 221 LECTURE #6 Introduction to Phonetics and Phonology FRICATIVES 1. Bilabial fricatives: Consonants (continued) The constriction is between the upper and lower lips. The lips are brought together

More information

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE. For PROVISIONAL APPLICATION FOR LETTERS PATENT. Inventor: David Rexford. Logical Phonetic Alphabet

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE. For PROVISIONAL APPLICATION FOR LETTERS PATENT. Inventor: David Rexford. Logical Phonetic Alphabet IN THE UNITED STATES PATENT AND TRADEMARK OFFICE PROVISIONAL APPLICATION FOR LETTERS PATENT For Logical Phonetic Alphabet Inventor: David Rexford Logical Phonetic Alphabet Inventor: David Rexford FIELD

More information

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

The role of periodicity in the perception of masked speech with simulated and real cochlear implants The role of periodicity in the perception of masked speech with simulated and real cochlear implants Kurt Steinmetzger and Stuart Rosen UCL Speech, Hearing and Phonetic Sciences Heidelberg, 09. November

More information

The Salience and Perceptual Weight of Secondary Acoustic Cues for Fricative Identification in Normal Hearing Adults

The Salience and Perceptual Weight of Secondary Acoustic Cues for Fricative Identification in Normal Hearing Adults City University of New York (CUNY) CUNY Academic Works Dissertations, Theses, and Capstone Projects Graduate Center 6-2014 The Salience and Perceptual Weight of Secondary Acoustic Cues for Fricative Identification

More information

Spoken language phonetics: Consonant articulation and transcription. LING 200 Spring 2006

Spoken language phonetics: Consonant articulation and transcription. LING 200 Spring 2006 Spoken language phonetics: Consonant articulation and transcription LING 200 Spring 2006 Announcements, reminders Quiz re Ch. 1-2: question 9 dropped, results have been recalculated Homework #1 (transcription

More information

Cue weighting in the perception of Dutch sibilants

Cue weighting in the perception of Dutch sibilants Cue weighting in the perception of Dutch sibilants MA Thesis RMA Linguistics University of Amsterdam Author: E.S.M. Ooijevaar Student number: 0527254 Supervisors: Prof. Dr. P.P.G. Boersma and Dr. D.J.M.

More information

Classification of Fricatives Using Novel Modulation Spectrogram Based Features

Classification of Fricatives Using Novel Modulation Spectrogram Based Features Classification of Fricatives Using Novel odulation Spectrogram Based Features Kewal D. alde, nshu Chittora, and Hemant. Patil Dhirubhai mbani Institute of Information and Communication Technology, Gandhinagar,

More information

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear The psychophysics of cochlear implants Stuart Rosen Professor of Speech and Hearing Science Speech, Hearing and Phonetic Sciences Division of Psychology & Language Sciences Prelude Envelope and temporal

More information

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation Aldebaro Klautau - http://speech.ucsd.edu/aldebaro - 2/3/. Page. Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation ) Introduction Several speech processing algorithms assume the signal

More information

International Forensic Science & Forensic Medicine Conference Naif Arab University for Security Sciences Riyadh Saudi Arabia

International Forensic Science & Forensic Medicine Conference Naif Arab University for Security Sciences Riyadh Saudi Arabia SPECTRAL EDITING IN SPEECH RECORDINGS: CHALLENGES IN AUTHENTICITY CHECKING Antonio César Morant Braid Electronic Engineer, Specialist Official Forensic Expert, Public Security Secretariat - Technical Police

More information

Production of Stop Consonants by Children with Cochlear Implants & Children with Normal Hearing. Danielle Revai University of Wisconsin - Madison

Production of Stop Consonants by Children with Cochlear Implants & Children with Normal Hearing. Danielle Revai University of Wisconsin - Madison Production of Stop Consonants by Children with Cochlear Implants & Children with Normal Hearing Danielle Revai University of Wisconsin - Madison Normal Hearing (NH) Who: Individuals with no HL What: Acoustic

More information

Feedback and feedforward control in apraxia of speech: Noise masking effects on fricative production.

Feedback and feedforward control in apraxia of speech: Noise masking effects on fricative production. Feedback and feedforward control in apraxia of speech: Noise masking effects on fricative production. The present study tested two hypotheses about apraxia of speech (AOS), framed in the DIVA model (Guenther,

More information

Optimal Filter Perception of Speech Sounds: Implications to Hearing Aid Fitting through Verbotonal Rehabilitation

Optimal Filter Perception of Speech Sounds: Implications to Hearing Aid Fitting through Verbotonal Rehabilitation Optimal Filter Perception of Speech Sounds: Implications to Hearing Aid Fitting through Verbotonal Rehabilitation Kazunari J. Koike, Ph.D., CCC-A Professor & Director of Audiology Department of Otolaryngology

More information

whether or not the fundamental is actually present.

whether or not the fundamental is actually present. 1) Which of the following uses a computer CPU to combine various pure tones to generate interesting sounds or music? 1) _ A) MIDI standard. B) colored-noise generator, C) white-noise generator, D) digital

More information

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Topics in Linguistic Theory: Laboratory Phonology Spring 2007 MIT OpenCourseWare http://ocw.mit.edu 24.91 Topics in Linguistic Theory: Laboratory Phonology Spring 27 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Auditory Scene Analysis

Auditory Scene Analysis 1 Auditory Scene Analysis Albert S. Bregman Department of Psychology McGill University 1205 Docteur Penfield Avenue Montreal, QC Canada H3A 1B1 E-mail: bregman@hebb.psych.mcgill.ca To appear in N.J. Smelzer

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 4aSCb: Voice and F0 Across Tasks (Poster

More information

Research Article Measurement of Voice Onset Time in Maxillectomy Patients

Research Article Measurement of Voice Onset Time in Maxillectomy Patients e Scientific World Journal, Article ID 925707, 4 pages http://dx.doi.org/10.1155/2014/925707 Research Article Measurement of Voice Onset Time in Maxillectomy Patients Mariko Hattori, 1 Yuka I. Sumita,

More information

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION

More information

Waves Sibilance User Guide

Waves Sibilance User Guide Waves Sibilance User Guide Introduction Waves Sibilance is a sophisticated, easy-to-use DeEsser. A DeEsser is an audio processor designed to attenuate high-frequency sounds generated when certain consonants

More information

Voice Low Tone to High Tone Ratio - A New Index for Nasal Airway Assessment

Voice Low Tone to High Tone Ratio - A New Index for Nasal Airway Assessment Chinese Journal of Physiology 46(3): 123-127, 2003 123 Voice Low Tone to High Tone Ratio - A New Index for Nasal Airway Assessment Guoshe Lee 1,4, Cheryl C. H. Yang 2 and Terry B. J. Kuo 3 1 Department

More information

Auditory scene analysis in humans: Implications for computational implementations.

Auditory scene analysis in humans: Implications for computational implementations. Auditory scene analysis in humans: Implications for computational implementations. Albert S. Bregman McGill University Introduction. The scene analysis problem. Two dimensions of grouping. Recognition

More information

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080 Perceptual segregation of a harmonic from a vowel by interaural time difference in conjunction with mistuning and onset asynchrony C. J. Darwin and R. W. Hukin Experimental Psychology, University of Sussex,

More information

Recognition of fricatives in normal hearing and simulated hearing-impaired conditions. Examination Number: B001855

Recognition of fricatives in normal hearing and simulated hearing-impaired conditions. Examination Number: B001855 Recognition of fricatives in normal hearing and simulated hearing-impaired conditions Examination Number: B001855 Master of Science in Developmental Linguistics Linguistics and English Language School

More information

Binaural Hearing. Why two ears? Definitions

Binaural Hearing. Why two ears? Definitions Binaural Hearing Why two ears? Locating sounds in space: acuity is poorer than in vision by up to two orders of magnitude, but extends in all directions. Role in alerting and orienting? Separating sound

More information

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening AUDL GS08/GAV1 Signals, systems, acoustics and the ear Pitch & Binaural listening Review 25 20 15 10 5 0-5 100 1000 10000 25 20 15 10 5 0-5 100 1000 10000 Part I: Auditory frequency selectivity Tuning

More information

Outline.! Neural representation of speech sounds. " Basic intro " Sounds and categories " How do we perceive sounds? " Is speech sounds special?

Outline.! Neural representation of speech sounds.  Basic intro  Sounds and categories  How do we perceive sounds?  Is speech sounds special? Outline! Neural representation of speech sounds " Basic intro " Sounds and categories " How do we perceive sounds? " Is speech sounds special? ! What is a phoneme?! It s the basic linguistic unit of speech!

More information

Technical Discussion HUSHCORE Acoustical Products & Systems

Technical Discussion HUSHCORE Acoustical Products & Systems What Is Noise? Noise is unwanted sound which may be hazardous to health, interfere with speech and verbal communications or is otherwise disturbing, irritating or annoying. What Is Sound? Sound is defined

More information

Interjudge Reliability in the Measurement of Pitch Matching. A Senior Honors Thesis

Interjudge Reliability in the Measurement of Pitch Matching. A Senior Honors Thesis Interjudge Reliability in the Measurement of Pitch Matching A Senior Honors Thesis Presented in partial fulfillment of the requirements for graduation with distinction in Speech and Hearing Science in

More information

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds Hearing Lectures Acoustics of Speech and Hearing Week 2-10 Hearing 3: Auditory Filtering 1. Loudness of sinusoids mainly (see Web tutorial for more) 2. Pitch of sinusoids mainly (see Web tutorial for more)

More information

Issues faced by people with a Sensorineural Hearing Loss

Issues faced by people with a Sensorineural Hearing Loss Issues faced by people with a Sensorineural Hearing Loss Issues faced by people with a Sensorineural Hearing Loss 1. Decreased Audibility 2. Decreased Dynamic Range 3. Decreased Frequency Resolution 4.

More information

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES Varinthira Duangudom and David V Anderson School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332

More information

Anumber of studies have shown that the perception of speech develops. by Normal-Hearing and Hearing- Impaired Children and Adults

Anumber of studies have shown that the perception of speech develops. by Normal-Hearing and Hearing- Impaired Children and Adults Perception of Voiceless Fricatives by Normal-Hearing and Hearing- Impaired Children and Adults Andrea L. Pittman Patricia G. Stelmachowicz Boys Town National Research Hospital Omaha, NE This study examined

More information

LINGUISTICS 130 LECTURE #4 ARTICULATORS IN THE ORAL CAVITY

LINGUISTICS 130 LECTURE #4 ARTICULATORS IN THE ORAL CAVITY LINGUISTICS 130 LECTURE #4 ARTICULATORS IN THE ORAL CAVITY LIPS (Latin labia ) labial sounds bilabial labiodental e.g. bee, my e.g. fly, veal TEETH (Latin dentes) dental sounds e.g. think, they ALVEOLAR

More information

EEL 6586, Project - Hearing Aids algorithms

EEL 6586, Project - Hearing Aids algorithms EEL 6586, Project - Hearing Aids algorithms 1 Yan Yang, Jiang Lu, and Ming Xue I. PROBLEM STATEMENT We studied hearing loss algorithms in this project. As the conductive hearing loss is due to sound conducting

More information

Bark and Hz scaled F2 Locus equations: Sex differences and individual differences

Bark and Hz scaled F2 Locus equations: Sex differences and individual differences Bark and Hz scaled F Locus equations: Sex differences and individual differences Frank Herrmann a, Stuart P. Cunningham b & Sandra P. Whiteside c a Department of English, University of Chester, UK; b,c

More information

Speech, Hearing and Language: work in progress. Volume 13

Speech, Hearing and Language: work in progress. Volume 13 Speech, Hearing and Language: work in progress Volume 13 Individual differences in phonetic perception by adult cochlear implant users: effects of sensitivity to /d/-/t/ on word recognition Paul IVERSON

More information

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment SBIR 99.1 TOPIC AF99-1Q3 PHASE I SUMMARY

More information

Perceptual Effects of Nasal Cue Modification

Perceptual Effects of Nasal Cue Modification Send Orders for Reprints to reprints@benthamscience.ae The Open Electrical & Electronic Engineering Journal, 2015, 9, 399-407 399 Perceptual Effects of Nasal Cue Modification Open Access Fan Bai 1,2,*

More information

Effects of noise and filtering on the intelligibility of speech produced during simultaneous communication

Effects of noise and filtering on the intelligibility of speech produced during simultaneous communication Journal of Communication Disorders 37 (2004) 505 515 Effects of noise and filtering on the intelligibility of speech produced during simultaneous communication Douglas J. MacKenzie a,*, Nicholas Schiavetti

More information

Evaluating the Clinical Effectiveness of EPG. in the Assessment and Diagnosis of Children with Intractable Speech Disorders

Evaluating the Clinical Effectiveness of EPG. in the Assessment and Diagnosis of Children with Intractable Speech Disorders Evaluating the Clinical Effectiveness of EPG in the Assessment and Diagnosis of Children with Intractable Speech Disorders Sara E. Wood*, James M. Scobbie * Forth Valley Primary Care NHS Trust, Scotland,

More information

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Frequency refers to how often something happens. Period refers to the time it takes something to happen. Lecture 2 Properties of Waves Frequency and period are distinctly different, yet related, quantities. Frequency refers to how often something happens. Period refers to the time it takes something to happen.

More information

Speech and Intelligibility Characteristics in Fragile X and Down Syndromes

Speech and Intelligibility Characteristics in Fragile X and Down Syndromes Speech and Intelligibility Characteristics in Fragile X and Down Syndromes David J. Zajac, Ph.D. 1 Gary E. Martin, Ph.D. 1 Elena Lamarche, B.A. 1 Molly Losh, Ph.D. 2 1 Frank Porter Graham Child Development

More information

A Senior Honors Thesis. Brandie Andrews

A Senior Honors Thesis. Brandie Andrews Auditory and Visual Information Facilitating Speech Integration A Senior Honors Thesis Presented in Partial Fulfillment of the Requirements for graduation with distinction in Speech and Hearing Science

More information

Speech Adaptation to Electropalatography in Children's Productions of /s/ and /ʃ/

Speech Adaptation to Electropalatography in Children's Productions of /s/ and /ʃ/ Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2014-06-02 Speech Adaptation to Electropalatography in Children's Productions of /s/ and /ʃ/ Marissa Celaya Brigham Young University

More information

The perception of high frequency sibilants in Hungarian male speech

The perception of high frequency sibilants in Hungarian male speech The perception of high frequency sibilants in Hungarian male speech Authors address: Author 1: Péter Rácz New Zealand Institute of Language Brain and Behaviour University of Canterbury, Private Bag 4800,

More information

PERCEPTUAL MEASUREMENT OF BREATHY VOICE QUALITY

PERCEPTUAL MEASUREMENT OF BREATHY VOICE QUALITY PERCEPTUAL MEASUREMENT OF BREATHY VOICE QUALITY By SONA PATEL A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER

More information

K. G. Munhall Department of Psychology, Department of Otolaryngology, Queen s University, Kingston, Ontario K7L 3N6, Canada

K. G. Munhall Department of Psychology, Department of Otolaryngology, Queen s University, Kingston, Ontario K7L 3N6, Canada Learning to produce speech with an altered vocal tract: The role of auditory feedback Jeffery A. Jones a) ATR International Human Information Science Laboratories, Communication Dynamics Project, 2-2-2

More information

Language Speech. Speech is the preferred modality for language.

Language Speech. Speech is the preferred modality for language. Language Speech Speech is the preferred modality for language. Outer ear Collects sound waves. The configuration of the outer ear serves to amplify sound, particularly at 2000-5000 Hz, a frequency range

More information

Age and hearing loss and the use of acoustic cues in fricative categorization

Age and hearing loss and the use of acoustic cues in fricative categorization Age and hearing loss and the use of acoustic cues in fricative categorization Odette Scharenborg a),b) Centre for Language Studies, Radboud University Nijmegen, Erasmusplein 1, 6525 HT, Nijmegen, The Netherlands

More information

Acoustic analysis of occlusive weakening in Parkinsonian French speech

Acoustic analysis of occlusive weakening in Parkinsonian French speech Acoustic analysis of occlusive weakening in Parkinsonian French speech Danielle Duez To cite this version: Danielle Duez. Acoustic analysis of occlusive weakening in Parkinsonian French speech. International

More information

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions. 24.963 Linguistic Phonetics Basic Audition Diagram of the inner ear removed due to copyright restrictions. 1 Reading: Keating 1985 24.963 also read Flemming 2001 Assignment 1 - basic acoustics. Due 9/22.

More information

Spectrograms (revisited)

Spectrograms (revisited) Spectrograms (revisited) We begin the lecture by reviewing the units of spectrograms, which I had only glossed over when I covered spectrograms at the end of lecture 19. We then relate the blocks of a

More information

PHONETIC AND AUDITORY TRADING RELATIONS BETWEEN ACOUSTIC CUES IN SPEECH PERCEPTION: PRELIMINARY RESULTS. Repp. Bruno H.

PHONETIC AND AUDITORY TRADING RELATIONS BETWEEN ACOUSTIC CUES IN SPEECH PERCEPTION: PRELIMINARY RESULTS. Repp. Bruno H. PHONETIC AND AUDITORY TRADING RELATIONS BETWEEN ACOUSTIC CUES IN SPEECH PERCEPTION: PRELIMINARY RESULTS Bruno H. Repp Abstract. When two different acoustic cues contribute to the perception of a phonetic

More information

Acoustics of Speech and Environmental Sounds

Acoustics of Speech and Environmental Sounds International Speech Communication Association (ISCA) Tutorial and Research Workshop on Experimental Linguistics 28-3 August 26, Athens, Greece Acoustics of Speech and Environmental Sounds Susana M. Capitão

More information