Shaheen N. Awan 1, Nancy Pearl Solomon 2, Leah B. Helou 3, & Alexander Stojadinovic 2

Similar documents
Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Differential specificity of acoustic measures to listener perception of voice quality

Visi-Pitch IV is the latest version of the most widely

Overview. Acoustics of Speech and Hearing. Source-Filter Model. Source-Filter Model. Turbulence Take 2. Turbulence

Proceedings of Meetings on Acoustics

Emerging Scientist: Challenges to CAPE-V as a Standard

Speech (Sound) Processing

Hearing in the Environment

A COMPARISON OF SUSTAINED VOWEL AND CONNECTED SPEECH PRODUCTION IN HYPOFUNCTIONAL AND NORMAL VOICES. Emily A. Lambert

Polly Lau Voice Research Lab, Department of Speech and Hearing Sciences, The University of Hong Kong

METHOD. The current study was aimed at investigating the prevalence and nature of voice

Auditory scene analysis in humans: Implications for computational implementations.

Effects of Subglottic Stenosis and Cricotracheal Resection on Voice Production in Women

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

Critical Review: Are laryngeal manual therapies effective in improving voice outcomes of patients with muscle tension dysphonia?

Voice Detection using Speech Energy Maximization and Silence Feature Normalization

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

be investigated online at the following web site: Synthesis of Pathological Voices Using the Klatt Synthesiser. Speech

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Interjudge Reliability in the Measurement of Pitch Matching. A Senior Honors Thesis

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds

ABSTRACT REPEATABILITY OF AERODYNAMIC MEASUREMENTS OF VOICE. by Courtney Rollins Garrison

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Computational Perception /785. Auditory Scene Analysis

Speech Spectra and Spectrograms

Topic 4. Pitch & Frequency

HCS 7367 Speech Perception

UKnowledge. University of Kentucky

Discrete Signal Processing

Consonant Perception test

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

Linguistic Phonetics Fall 2005

PATTERN ELEMENT HEARING AIDS AND SPEECH ASSESSMENT AND TRAINING Adrian FOURCIN and Evelyn ABBERTON

Speech Cue Weighting in Fricative Consonant Perception in Hearing Impaired Children

Spectrograms (revisited)

PERCEPTUAL MEASUREMENT OF BREATHY VOICE QUALITY

Automatic Live Monitoring of Communication Quality for Normal-Hearing and Hearing-Impaired Listeners

Quarterly Progress and Status Report. Effect on LTAS of vocal loudness variation

Language Speech. Speech is the preferred modality for language.

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32

11 Music and Speech Perception

International Forensic Science & Forensic Medicine Conference Naif Arab University for Security Sciences Riyadh Saudi Arabia

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

Auditory Scene Analysis

Topic 4. Pitch & Frequency. (Some slides are adapted from Zhiyao Duan s course slides on Computer Audition and Its Applications in Music)

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Twenty subjects (11 females) participated in this study. None of the subjects had

Voice Evaluation. Area of Concern:

Creative Commons: Attribution 3.0 Hong Kong License

A COMPARISON OF PSYCHOPHYSICAL METHODS FOR THE EVALUATION OF ROUGH VOICE QUALITY

Credits. Learner objectives and outcomes. Outline. Why care about voice quality? I. INTRODUCTION

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080

The Pennsylvania State University. The Graduate School. College of Health and Human Development THE EFFECT OF EXPERIENCE AND THE RELATIONSHIP AMONG

It is important to understand as to how do we hear sounds. There is air all around us. The air carries the sound waves but it is below 20Hz that our

Overview. Each session is presented with: Text and graphics to summarize the main points Narration, video and audio Reference documents Quiz

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

Dr Sabah Mohammed Hassan. Consultant Phoniatrician

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

Voice Outcomes Following Treatment of Benign Midmembranous Vocal Fold Lesions Using a Nomenclature Paradigm

Research Article Measurement of Voice Onset Time in Maxillectomy Patients

Voice Analysis in Individuals with Chronic Obstructive Pulmonary Disease

A Study on the Degree of Pronunciation Improvement by a Denture Attachment Using an Weighted-α Formant

2012, Greenwood, L.

Auditory gist perception and attention

Speech Generation and Perception

Reference: Mark S. Sanders and Ernest J. McCormick. Human Factors Engineering and Design. McGRAW-HILL, 7 TH Edition. NOISE

HCS 7367 Speech Perception

Acoustics of Speech and Environmental Sounds

A Determination of Drinking According to Types of Sentence using Speech Signals

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

CLUTTERING SEVERITY ASSESSMENT: TALKING TIME DETERMINATION DURING ORAL READING

Temporal offset judgments for concurrent vowels by young, middle-aged, and older adults

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

Binaural processing of complex stimuli

Sound Waves. Sensation and Perception. Sound Waves. Sound Waves. Sound Waves

group by pitch: similar frequencies tend to be grouped together - attributed to a common source.

The Effects of Sleep Deprivation on the Acoustic and Perceptual Characteristics of Voice

Learning Process. Auditory Training for Speech and Language Development. Auditory Training. Auditory Perceptual Abilities.

of the Consensus Auditory-Perceptual Evaluation The Consensus Auditory-Perceptual Evaluation of Voice: Development of a Standardized Clinical Protocol

Congruency Effects with Dynamic Auditory Stimuli: Design Implications

The Phonetic Characteristics in Patients of Bilateral Vocal Fold Paralysis Without Tracheotomy

Why? Speech in Noise + Hearing Aids = Problems in Noise. Recall: Two things we must do for hearing loss: Directional Mics & Digital Noise Reduction

Perception of American English can and can t by Japanese professional interpreters* 1

Putting the focus on conversations

whether or not the fundamental is actually present.

What Is the Difference between db HL and db SPL?

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

A Prospective Crossover Trial of Botulinum Toxin Chemodenervation Versus Injection Augmentation for Essential Voice Tremor

EXPLORATORY ANALYSIS OF SPEECH FEATURES RELATED TO DEPRESSION IN ADULTS WITH APHASIA

Speech and Voice Impairments in Individuals with PD. Jessica E. Huber, Ph.D. Associate Professor Purdue University

Gick et al.: JASA Express Letters DOI: / Published Online 17 March 2008

ACOUSTIC ANALYSIS AND PERCEPTION OF CANTONESE VOWELS PRODUCED BY PROFOUNDLY HEARING IMPAIRED ADOLESCENTS

Closing and Opening Phase Variability in Dysphonia Adrian Fourcin(1) and Martin Ptok(2)

Systems Neuroscience Oct. 16, Auditory system. http:

Voice Pitch Control Using a Two-Dimensional Tactile Display

Role of F0 differences in source segregation

Transcription:

Shaheen N. Awan 1, Nancy Pearl Solomon 2, Leah B. Helou 3, & Alexander Stojadinovic 2 1 Bloomsburg University of Pennsylvania; 2 Walter Reed National Military Medical Center; 3 University of Pittsburgh

Dr. S. N. Awan is a consultant to KayPentax (Montvale, NJ) for the development of commercial computer software including cepstral analysis of continuous speech algorithms. KayPentax licenses the algorithms that form the basis of the Analysis of Dysphonia in Speech & Voice (ADSV) program from Dr. Awan. The views expressed in this presentation are those of the authors and do not reflect official policy of the United States Army, Department of Defense, or US Government.

Time-based perturbation measures have at least two key limitations: 1. Difficulty Analyzing More Severely Dysphonic Vowel Samples 2. Lack of Validity of Traditional Perturbation Measures In the Analysis of Continuous Speech:

In contrast to traditional perturbation analyses, spectralbased acoustic measures have shown the ability to characterize the voice signal by extracting characteristics such as the fundamental frequency (F 0 ) and the relative amplitude of harmonics vs. noise (de Krom, 1993) without the necessity of identifying cycle boundaries. Spectral-based methods analyze frames of data rather than cycles. *** Spectral/cepstral measures are able to provide valid and reliable correlates of vocal quality in continuous speech contexts.***

The cepstrum (a Fourier transform of the power spectrum of the voice signal) can graphically display the extent to which the dominant rahmonic (an anagram of harmonic a cepstral peak often associated with the vocal fundamental frequency) is individualized and emerges out of the background noise level. Measures of the relative amplitude of the cepstral peak in relation to extraneous cepstral components have been reported to provide an effective method for quantifying the severity of the dysphonic voice (Awan, Roy, & Dromey, 2009; Awan & Roy, 2005; Awan & Roy, 2006; Heman-Ackah, Michael, & Goding, 2002; Wolfe & martin, 1997; Hillenbrand, Cleveland, and Erickson,1994).

Examples of the log spectrum of a voiced signal and subsequent cepstrum (from Papamichalis, 1987)

Typical cepstrum for a normal male sustained vowel production. The cepstral peak is circled the cepstral peak corresponds to the fundamental frequency (115.91 Hz) and a quefrency (x-axis value in time) of approx. 8.63 ms.

Analysis of Dysphonia in Speech and Voice (ADSV) from KayPENTAX is the first commercial program of its kind, allowing for voice quality assessment of sustained vowel and continuous speech samples in normal and mild-to-severely dysphonic voices. This program provides several key spectral and cepstral measures of the voice sample along with a graphic display of how these values change over time. The program also incorporates the Cepstral Spectral Index of Dysphonia (CSID) a multifactorial estimate of vocal severity that correlates with the VAS (%) severity scale used in the CAPE-V.

A E B C F D ADSV main screen - spectral/cepstral analyses of a normal female voice sample ("We were away a year ago"). Analysis windows include: (A) Sound spectrogram; (B) sound wave; (C) Low/High spectral ratio (L/H Ratio) over time; (D) Cepstral peak prominence (CPP) over time; (E) focused spectral analysis per data frame; (F) focused cepstral analysis per data frame.

In 2010, Awan, Roy, Jetté, Meltzner, & Hillman reported that an algorithm incorporating measures from cepstral and spectral analyses was able to produce estimates of dysphonia severity that strongly correlated with auditory-perceptual judgments of dysphonia severity. Using measures of the cepstral peak prominence (CPP), a ratio of low vs. high frequency spectral energy (L/H ratio), and the respective standard deviations of these measures, Awan et al. (2010) reported: R = 0.81 between acoustic and auditory-perceptual estimates of dysphonia severity in CAPE-V sentences, R = 0.96 between acoustic and auditory-perceptual estimates of dysphonia severity in sustained vowel productions.

All CAPE-V Sentences Combined Sustained Vowel 100 100 Dysphonia Severity (100 pt. VAS) 80 60 40 20 Dysphonia Severity (100 pt. VAS) 80 60 40 20 0 Normal Mild Moderate Severe 0 Normal Mild Moderate Severe Group (n=128; 32 subjects per group) Group (n=32; 8 subjects per group) Estimated Rating Listener Rating Estimated Rating Listener Rating

"How hard did he hit him?" "We were away a year ago." 100 100 Dysphonia Severity (100 pt. VAS) 80 60 40 20 Dysphonia Severity (100 pt. VAS) 80 60 40 20 0 Normal Mild Moderate Severe 0 Normal Mild Moderate Severe Group (n=32; 8 subjects per group) Group (n=32; 8 subjects per group) Estimated Rating Listener Rating Estimated Rating Listener Rating "We eat eggs at Easter." "Peter will keep at the peak." 100 100 Dysphonia Severity (100 pt. VAS) 80 60 40 20 Dysphonia Severity (100 pt. VAS) 80 60 40 20 0 Normal Mild Moderate Severe 0 Normal Mild Moderate Severe Group (n=32; 8 subjects per group) Group (n=32; 8 subjects per group) Estimated Rating Listener Rating Estimated Rating Listener Rating

Because any research finding (strong or weak) may simply be a reflection of the particular sample of subjects being studied, it is essential that the results of any single study be replicated with alternative samples. In this way, the external validity (i.e., the ability to reproduce results with alternative subjects and in settings outside of the original study) of a particular research finding can be established. The goal of the present study was to assess the external validity of the acoustic algorithm and analysis methods reported in Awan et al. (2010) with a completely new and independent set of normal and disordered CAPE-V samples and associated listener judgments.

Samples were obtained from previously recorded voices of patients scheduled for partial or total thyroidectomy. Perceptual and acoustic analyses were conducted for subjects pre- and postthyroidectomy. CAPE-V sentences and sustained vowel samples were elicited from each subject at comfortable pitch and loudness levels.

3 experienced SLPs rated CAPE-V samples Custom automated program Blinded, randomized order, blocked for subject Over headphones in sound-treated booth A single rating for vowel and sentences combined. Separate sessions for male and female samples Accompanied by an anchor for moderate severity Median ratings (% of 100-mm line, labeled for severity) Severity Roughness Breathiness Strain

2-s center of each vowel /ɑ/ trimmed for onset & offset CAPE-V sentences soft glottal attacks and voiceless to voiced transitions ( How hard did he hit him? ), the presence of possible voiced stoppages or spasms, and the ability to maintain consistent voicing ( We were away a year ago ), the presence of hard glottal attacks ( We eat eggs every Easter ), the ability to transition easily between voiceless stop-plosive production and vowel production ( Peter will keep at the peak ). Measures of the cepstral peak prominence (CPP) and the ratio of low vs. high frequency spectral energy (L/H Ratio), as well as the standard deviations for the aforementioned measures were obtained.

In addition to the aforementioned acoustic measures, the ADSV program produces an estimate of dysphonia severity called the CSID (Cepstral/Spectral Index of Dysphonia). Separate CSID values are provided for sustained vowels vs. individual speech samples. For the purposes of this study, the CSID values for the vowel and each analyzed sentence were averaged to provide a single acoustic estimate of dysphonia.

Samples of 40 voices (20 normal [mean CAPE-V severity = 8.52%, SD = 4.90] and 20 dysphonic samples [mean CAPE-V severity = 30.84%, SD = 13.69]) were selected from the initial corpus of data: These voices reflected a relatively wide range of perceived dysphonia severity (CAPE-V Range = 1 to 65). Allowed focus on the ability to discriminate normal/typical voices from those judged to have mild-to-moderate degrees of dysphonia severity. Equal distribution of males and females.

Statistical evaluation of the CSID values (i.e., acoustically estimated dysphonia severity) vs. auditory-perceptual dysphonia severities (i.e., CAPE-V ratings) revealed the following: No significant difference between CSID vs. auditoryperceptual dysphonia severity in normal subjects. No significant difference between CSID vs. auditoryperceptual dysphonia severity in disordered subjects. Significant differences in dysphonia severity between normal vs. disordered subjects whether estimated via acoustic analyses (CSID; t (38) = -5.01, p <.001) or via auditory-perceptual judgment (t (38) = -6.87, p <.001).

Subjects Mean Cepstral/Spectral Index of Dysphonia (CSID) Mean Auditory- Perceptual Rating (CAPE-V) 20 Normal Voices 5.72 (SD = 9.97) 8.52 (SD = 4.90) 20 Disordered Voices 28.50 (SD = 17.71) 30.84 (SD = 13.69)

Across all 40 subjects, a strong and significant correlation between CSID values and CAPE-V auditory-perceptual ratings of dysphonia severity was observed: (r = 0.85; r 2 = 0.73; p <.001).

CAPE-V Severity Rating CSID: Vowel /ɑ/ CSID: Easy Onset Sentence CSID: All Voiced Sentence CSID: Glottal Sentence CSID: Plosive Sentence r = 0.67 r = 0.79 r = 0.82 r = 0.78 r = 0.78 All Pearson s r correlations significant at p <.01.

The results of this study indicate that the acoustic algorithm reported by Awan et al. (2010) and incorporated in the CSID is externally valid and an effective correlate of perceived dysphonia CAPE-V severity. Results are actually somewhat stronger than observed in Awan et al. (2010) for the multifactor CSID measure (sample characteristics differed in that the voices in the current study tended to be in the normal to moderate range of severity) - as in Awan et al. (2010), the all voiced sentence was observed to provide the best individual correlate of dysphonia severity.

In this study, spectral/cepstral measures from sentences correlated best with perceived dysphonia severity in contrast to Awan et al s (2010) sample in which measures from vowels provided the strongest correlate. Dysphonia may be more prominent in vowels vs. sentences, or vice versa, in different cases therefore, both samples are necessary.

The CSID provides an objective, multivariate measure of dysphonia severity that is effective in both sustained vowel and continuous speech contexts. The objective, automatic estimation of dysphonia severity in continuous speech and vowel samples is a potentially valuable and easily communicated method of categorizing voice and voice change without requiring multiple trained judges.

Awan, S. N., & Roy, N. (2005). Acoustic prediction of voice type in women with functional dysphonia. Journal of Voice, 19(2), 268 282. Awan, S. N., & Roy, N. (2006). Toward the development of an objective index of dysphonia severity: a four-factor acoustic model. Clinical Linguistics & Phonetics, 20(1), 35 49. Awan, S. N., Roy, N., & Dromey, C. (2009). Estimating dysphonia severity in continuous speech: application of a multi-parameter spectral/cepstral model. Clinical Linguistics & Phonetics, 23(11), 825 841. Awan, S. N., Roy, N., Jetté, M. E., Meltzner, G. S., & Hillman, R. E. (2010). Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgments from the CAPE-V. Clinical Linguistics & Phonetics, 24(9), 742-758. de Krom, G. (1993). A cepstrum-based technique for determining a harmonicsto-noise ratio in speech signals. Journal of Speech and Hearing Research, 36(2), 254-266. Heman-Ackah, Y. D., Michael, D. D., & Goding, G. S., Jr. (2002). The relationship between cepstral peak prominence and selected parameters of dysphonia. Journal of Voice, 16(1), 20-27. Hillenbrand, J., Cleveland, R. A., & Erickson, R. L. (1994). Acoustic correlates of breathy vocal quality. Journal of Speech and Hearing Research, 37(4), 769-778. Papamichalis, P.E. (1987). Practical Approaches to Speech Coding. Englewood Cliffs, NJ: Prentice-Hall. Wolfe, V., & Martin, D. (1997). Acoustic correlates of dysphonia: type and severity. Journal of Communication Disorders, 30(5), 403-415.