HCS 7367 Speech Perception

Similar documents
HCS 7367 Speech Perception

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Issues faced by people with a Sensorineural Hearing Loss

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

9/29/14. Amanda M. Lauer, Dept. of Otolaryngology- HNS. From Signal Detection Theory and Psychophysics, Green & Swets (1966)

Psychoacoustical Models WS 2016/17

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

Intelligibility of narrow-band speech and its relation to auditory functions in hearing-impaired listeners

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Role of F0 differences in source segregation

Linguistic Phonetics Fall 2005

BINAURAL DICHOTIC PRESENTATION FOR MODERATE BILATERAL SENSORINEURAL HEARING-IMPAIRED

Time Varying Comb Filters to Reduce Spectral and Temporal Masking in Sensorineural Hearing Impairment

Computational Perception /785. Auditory Scene Analysis

NIH Public Access Author Manuscript J Hear Sci. Author manuscript; available in PMC 2013 December 04.

A. SEK, E. SKRODZKA, E. OZIMEK and A. WICHER

1706 J. Acoust. Soc. Am. 113 (3), March /2003/113(3)/1706/12/$ Acoustical Society of America

Prescribe hearing aids to:

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

Hearing. Juan P Bello

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

Topic 4. Pitch & Frequency

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

The development of a modified spectral ripple test

What Is the Difference between db HL and db SPL?

Enrique A. Lopez-Poveda Alan R. Palmer Ray Meddis Editors. The Neurophysiological Bases of Auditory Perception

Spectral-peak selection in spectral-shape discrimination by normal-hearing and hearing-impaired listeners

Representation of sound in the auditory nerve

Spectrograms (revisited)

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Noise Induced Hearing Loss

Lecture 3: Perception

Auditory nerve. Amanda M. Lauer, Ph.D. Dept. of Otolaryngology-HNS

Audiogram+: The ReSound Proprietary Fitting Algorithm

Sound localization psychophysics

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation

Topic 4. Pitch & Frequency. (Some slides are adapted from Zhiyao Duan s course slides on Computer Audition and Its Applications in Music)

Chapter 40 Effects of Peripheral Tuning on the Auditory Nerve s Representation of Speech Envelope and Temporal Fine Structure Cues

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication

Noise Susceptibility of Cochlear Implant Users: The Role of Spectral Resolution and Smearing

ClaroTM Digital Perception ProcessingTM

Who are cochlear implants for?

Signals, systems, acoustics and the ear. Week 5. The peripheral auditory system: The ear as a signal processor

Asynchronous glimpsing of speech: Spread of masking and task set-size

Healthy Organ of Corti. Loss of OHCs. How to use and interpret the TEN(HL) test for diagnosis of Dead Regions in the cochlea

EEL 6586, Project - Hearing Aids algorithms

PLEASE SCROLL DOWN FOR ARTICLE

Infant Hearing Development: Translating Research Findings into Clinical Practice. Auditory Development. Overview

SOLUTIONS Homework #3. Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03

Speech intelligibility in simulated acoustic conditions for normal hearing and hearing-impaired listeners

Lateralized speech perception in normal-hearing and hearing-impaired listeners and its relationship to temporal processing

COM3502/4502/6502 SPEECH PROCESSING

Providing Effective Communication Access

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Best Practice Protocols

Even though a large body of work exists on the detrimental effects. The Effect of Hearing Loss on Identification of Asynchronous Double Vowels

64 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 1, JANUARY 2014

Binaural Hearing. Why two ears? Definitions

Effects of Age and Hearing Loss on the Processing of Auditory Temporal Fine Structure

PERIPHERAL AND CENTRAL AUDITORY ASSESSMENT

Slow compression for people with severe to profound hearing loss

Thresholds for different mammals

Acoustics Research Institute

The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet

UvA-DARE (Digital Academic Repository)

J Jeffress model, 3, 66ff

Technical Discussion HUSHCORE Acoustical Products & Systems

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

Proceedings of Meetings on Acoustics

Spatial processing in adults with hearing loss

Speech intelligibility in background noise with ideal binary time-frequency masking

EFFECTS OF SPECTRAL DISTORTION ON SPEECH INTELLIGIBILITY. Lara Georgis

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds

Why? Speech in Noise + Hearing Aids = Problems in Noise. Recall: Two things we must do for hearing loss: Directional Mics & Digital Noise Reduction

HEARING AND PSYCHOACOUSTICS

Masked Perception Thresholds of Low Frequency Tones Under Background Noises and Their Estimation by Loudness Model

Masker-signal relationships and sound level

SPHSC 462 HEARING DEVELOPMENT. Overview Review of Hearing Science Introduction

Research Article The Acoustic and Peceptual Effects of Series and Parallel Processing

The functional importance of age-related differences in temporal processing

An Auditory-Model-Based Electrical Stimulation Strategy Incorporating Tonal Information for Cochlear Implant

Simulations of high-frequency vocoder on Mandarin speech recognition for acoustic hearing preserved cochlear implant

Digital Speech and Audio Processing Spring

Power Instruments, Power sources: Trends and Drivers. Steve Armstrong September 2015

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

Study of perceptual balance for binaural dichotic presentation

HEARING. Structure and Function

Beyond the audiogram: Influence of supra-threshold deficits associated with hearing loss and age on speech intelligibility

A FRESH Approach to Pediatric Behavioral Testing

Elements of Effective Hearing Aid Performance (2004) Edgar Villchur Feb 2004 HearingOnline

Hearing Aids. Bernycia Askew

Auditory model for the speech audiogram from audibility to intelligibility for words (work in progress)

doi: /brain/awn308 Brain 2009: 132; Enhanced discrimination of low-frequency sounds for subjects with high-frequency dead regions

Speech Intelligibility Measurements in Auditorium

Transcription:

Long-term spectrum of speech HCS 7367 Speech Perception Connected speech Absolute threshold Males Dr. Peter Assmann Fall 212 Females Long-term spectrum of speech Vowels Males Females 2) Absolute threshold Sound pressure level (db) 12 1 8 6 4 2 Audibility of speech Conversational Speech 62.5 125 25 5 1k 2k 4k 8k 16k 32k Frequency (Hz) Absolute threshold (normal listeners) Types of hearing loss Conductive loss Sensorineural loss Audibility/distortion Effect of noise Hearing Loss in db (ANSI-1996) Pure-tone audiogram Left ear 25 5 1K 2K 4K -1 1 2 3 4 5 6 7 8 9 1 11 12 13 Normal -1 1 2 3 4 5 6 7 8 9 1 11 12 13 Right ear 25 5 1K 2K 4K Conductive loss Source: www.brainconnection.com Bone conduction thresholds Air conduction thresholds Source: www.bcm.tmc.edu/oto/studs/aud.html 1

Hearing Loss in db (ANSI-1996) Pure-tone audiogram Left ear 25 5 1K 2K 4K -1 1 2 3 4 5 6 7 8 9 1 11 12 13 Sensorineural loss -1 1 2 3 4 5 6 7 8 9 1 11 12 13 Bone conduction thresholds Air conduction thresholds Right ear 25 5 1K 2K 4K Mixed loss Speech audiometry Nonsense syllables, real words, words in sentences threshold for recognizing 5% of test items percentage of items correctly reported Speech tests provide a valid measure of hearing handicap. Poor speech scores may indicate hearing loss of retrocochlear origin Source: www.bcm.tmc.edu/oto/studs/aud.html Sensorineural hearing loss Listeners with cochlear hearing loss have difficulty recognizing speech when background noise is present. Reduced audibility Supra-threshold distortions Impaired frequency selectivity Loudness recruitment Speech recognition in noise Speech reception threshold, SRT (Plomp & Mimpen, 1969) Speech-to-noise ratio required to achieve a specific level of intelligibility, typically 5% Effects of speech materials Effects of type of masker (e.g., speech-shaped noise vs. a single competing talker) Effects of spatial separation of target & masker Speech recognition in noise Masker type Listening situation Deficit in SRT Speech-shaped noise Speech-shaped noise Speech+masker in front, unaided Speech+masker in front, aided 2.5-7. db 2.5-6. db Single talker Speech+masker in front, unaided 6. - 12. db Single talker Speech+masker in front, aided 4. - 1. db Articulation Index How much does audibility contribute to difficulty understanding speech in noise? Articulation Index (AI) estimates the contribution of audibility (and other factors) to speech intelligibility Single talker Speech+masker in front, spatially separated 12. 19. db Source: Moore, BCJ (23) Speech Communication 2

Articulation Index 1. Divides the speech and masker spectrum into a small number of frequency bands 2. Estimates the audibility of speech in each band, weighted by its relative importance for intelligibility 3. Derives overall intelligibility by summing the contributions of each band. Articulation Index Most studies show that speech intelligibility is worse than predicted by the AI for hearing-impaired listeners, especially for moderate or severe hearing loss. Articulation Index Conclusion: factors other than audibility must be responsible for the difficulties experienced by hearing-impaired listeners understanding speech in noise. What else? Frequency selectivity Temporal resolution Frequency Selectivity Frequency selectivity is the ability to resolve the spectral components of complex sounds. Reduced frequency selectivity may lead to difficulty in understanding speech in noise. Auditory filters Fletcher (194) suggested that the peripheral auditory system could be modeled as a bank of linear bandpass filters with continuously overlapping center frequencies. Auditory filters Each point along the basilar membrane corresponds to a filter with a different center frequency, with center frequencies increasing roughly logarithmically from the apex to the base. Gain Frequency 3

Auditory filters About half of the length of the human basilar membrane is devoted to the lowest khz (F1 range of speech) with the majority of neural fibers responding best to low-tomid-frequencies. Critical Bandwidth Fletcher (194) band-widening experiment The threshold for detecting a pure tone in the presence of a bandpass noise masker increases as the noise bandwidth increases, until the width of the band exceeds the critical bandwidth of the auditory filter. Tone detection threshold Noise masker bandwidth Critical Bandwidth Sources of evidence for critical bandwidth: Band-widening experiments (Fletcher, 194) Loudness summation (Zwicker et al., 1957) Two-tone masking (Zwicker, 1954) Discrimination of partials within complex tones (Plomp and Mimpen, 1968) Critical Bandwidth Fletcher (194) made the simplifying assumption that the auditory filter could be modeled as a rectangle, with flat top and vertical slopes. Gain CB Frequency Only the lowest 5-8 partials can be reliably discriminated. Power spectrum model of masking Power spectrum model of masking Fletcher suggested that only a narrow band of frequencies in the region of the tone contribute to masking. He called this the critical bandwidth (CB). Gain Auditory Filter CB Frequency But threshold changes gradually as the noise bandwidth increases, suggesting auditory filters with sloping rather than rectangular skirts (Patterson, 1976). Gain Auditory Filter CB Frequency 4

Power spectrum model of masking Detection of probe tone in the presence of a noise masker depends of the relative power of probe and noise passed by the auditory filter centered on the tone (Patterson, 1976). Auditory Filter Power spectrum model of masking Noise power is often specified as the power in a band of frequencies 1 Hz wide. This is called noise power density, designated N. The total power in a band of noise is calculated as W N, where W is the noise bandwidth in Hz. Tone Noise masker W Frequency Power spectrum model of masking When the noise just masks the tone, the ratio of the power in the tone to the power in the noise is a constant, K. P / ( W N ) K and W P / ( K N ) Power spectrum model of masking Assumptions: Only frequencies within the passband of the auditory filter contribute to masking. Detection is based on a single auditory filter, centered on the frequency of the tone. Listeners ignore short-term fluctuations in the noise, and do not rely on phase differences between signal and noise. LP Noise Notched noise method Auditory Filter HP Noise HP Noise Off-frequency listening Shifted Filter Tone detection can be improved by shifting filter center frequency to maximize SNR Tone Patterson (1976) Tone 5

Notched noise method Patterson (1976) estimated auditory filter shapes from the function relating tone threshold to notch width. The derived filters have a rounded top and steep skirts, with bandwidths 1-15% of filter center frequency. Relative amplitude (db) -1-2 -3-4 -5 4 6 8 1 12 14 16 Derived auditory filter shape Relative amplitude (db) -1-2 -3-4 Simulation of reduced frequency selectivity -5 4 6 8 1 12 14 16 Normal ( ) -1-2 -3-4 -5 4 6 8 1 12 14 16 Impaired (3 Normal) Derived auditory filter shapes Filter Gain (db) -1-2 -3-4 -5 Auditory filter shapes as a function of frequency Frequency response of gammatone filter bank Fc=194 Hz ERB=143 Hz -6 1 2 3 4 5 Output level (db) Auditory filter shapes as a function of level 9 8 7 6 5 4 3 2 1 5 1 15 2 Frequency (Hz) Equivalent Rectangular Bandwidth The equivalent rectangular bandwidth (or ERB) of a filter is the bandwidth of a rectangular filter which has the same power output as that filter, when the input is white noise. Equivalent Rectangular Bandwidth ERB equivalent rectangular bandwidth of the estimated auditory filter about 1-15% of the filter center frequency. 2 1 Relative amplitude (db) -1-2 -3-4 ERB ERB (Hz).5.2.1.5-5 4 6 8 1 12 14 16.2.5.1.2.5 1 2 5 1 Center Frequency (Hz) 6

Cochlear frequency-place map Greenwood (1961) developed a function to relate the characteristic frequency (CF) at each place on the cochlea to the distance (x) of that place from the apex. ERB Scale One ERB unit corresponds to a distance of about.89 mm along the basilar membrane. Human data ERB-rate scale The ERB-rate scale is a warped frequency scale modeling changes in the ERB of the auditory filter as a function of frequency. ERB-rate (ERB) 3 25 2 15 1 5 1 2 3 4 Frequency (Hz) ERB-rate as a function of frequency Excitation patterns Auditory excitation patterns show the composite output of a bank of simulated auditory filters as a function of filter center frequency. Filter output Filter Center Frequency Excitation patterns Excitation patterns Excitation patterns provide a good model of auditory frequency selectivity and masking: frequency components that are resolved by the auditory system produce distinct peaks in the excitation pattern. Outer and middle ears Energy Detector Energy Detector Energy Detector Frequency (ERB-rate) Cochlear Filtering CNS 7

Excitation patterns Excitation patterns -1 5 Hz pure tone -1 Complex tone, equal amplitude harmonics Amplitude (db) -2-3 -4 Amplitude (db) -2-3 -4-5 -5-6.2.5 1. 2. 4. -6.2.5 1. 2. 4. Excitation patterns Auditory filterbank spectrogram Amplitude (db) -1-2 -3-4 -5 2 Hz Vowel: / æ / 4 6 F = 2 Hz 8 F2 145 Hz F3 245 Hz Frequency (Hz) 2. 1..5-6.2.5 1. 2. 4..2.1 1 2 3 4 5 6 7 Time (ms) Simulation studies Simulation of reduced frequency selectivity Simulation of reduced frequency selectivity (spectral smearing of the short-term speech spectrum) results in lowered intelligibility for listeners with normal hearing, particularly in noise (ter Keurs et al., 1993; Baer & Moore, 1994) Relative amplitude (db) -1-2 -3-4 -5 4 6 8 1 12 14 16 Normal ( ) -1-2 -3-4 -5 4 6 8 1 12 14 16 Impaired (3 Normal) Derived auditory filter shapes 8

Amplitude (db) -1-2 -3-4 -5 Effects of reduced frequency selectivity on vowel / ӕ / F = 2 Hz 2 Hz 4 6 8 F2 145 Hz F3 245 Hz 3 x normal 2 x normal 1 x normal Distortion of spectral shape Broader auditory filters produce a smeared excitation pattern: reduced prominence of peaks, smaller peak-to-valley ratios. Introduction of noise fills up the valleys between the spectral peaks and reduces the distinctiveness of the spectral profile. -6.2.5 1. 2. 4. Distortion of temporal structure Broader auditory filters alter the temporal fine structure of the output. Increased contribution of adjacent components Increase in within-channel modulation Diminished differences between adjacent channels Filter center frequency (Hz) Effects of reduced frequency selectivity on temporal structure 31 62 93 124 155 186 217 248 279 31 3311 3612 3913 4214 Normal x 1 Normal x 3 Normal x 1 31 62 93 124 155 186 217 248 279 31 3311 3612 3913 4214 Normal x 3 4515 1 2 3 Time 4515 1 2 3 Time Loudness Recruitment When a sound is increased in level above absolute threshold, the rate of growth of loudness is greater than normal. At levels >9-1 db SPL, loudness returns to normal (sound appears equally loud to hearing-impaired and normal listeners). Loudness Recruitment Loudness recruitment is associated with reduced dynamic range (range between absolute threshold and highest comfortable level). Recruitment may reduce the ability to listen in the dips in a fluctuating masker, such as a competing voice. Recruitment distorts loudness relationships among components of speech sounds. 9

Glimpsing speech in noise A glimpsing model of speech perception in noise speech is a highly modulated signal in time and frequency, regions of high energy are typically sparsely distributed. Martin Cooke Journal of the Acoustical Society of America, Vol. 119, No. 3, pp. 1562 1573, March 26 Frequency (Hz) 2. 1..5.2.1 1 2 3 4 5 6 7 Time (ms) Glimpsing speech in noise The information conveyed by the spectrotemporal energy spectrum of clean speech is redundant Redundancy allows speech to be identified based on relatively sparse evidence. Frequency (Hz) 2. 1..5 Glimpsing speech in noise Can listeners take advantage of glimpses? direct attention to spectrotemporal regions where the S+N mixture is dominated by the target speech ASR system trained to recognize consonants in noise Maskers differed in glimpse size ASR model developed to exploit non-uniform distribution of SNR in different time-frequency bands Conclusion: model + listeners benefit from glimpsing..2.1 1 2 3 4 5 6 7 Time (ms) Speech + noise mixtures Some regions dominated by target voice Local SNR varies across time and frequency Where the target voice dominates, the problem of source segregation is solved because the signal is effectively clean speech. Clean speech is highly redundant; it remains intelligible after 5% or more of its energy is removed by gating and/or filtering STEP model Auditory excitation pattern (Moore, 23) Spectrogram-like representation Reflects non-uniform frequency selectivity in different frequency bands Incorporates a sliding time window reflecting temporal analysis by the auditory system Relative audibility at different frequencies Loudness model 1

Missing data ASR HMM-based speech recognizer Missing-data models Glimpses only Ignore missing information (in masked regions) Glimpses-plus-background Try to fill in missing information (based on masked regions) Sparseness and redundancy Glimpses = spectrotemporal regions where signal exceeds masker by ~3 db. single talker masker target eight-talker masker speech-shaped noise glimpses Syllable identification accuracy as a function of the number of competing voices. The level of the target speech (monosyllabic nonsense words) was held constant at 95dB. (After Miller 1947). Results Results FIG. 4. The correlation between intelligibility and proportion of the target speech in which the local SNR exceeds 3 db. Each point represents a noise condition, and proportions are means across all tokens in the test set. The best linear fit is also shown. The correlation between listeners and these putative glimpses is.955. 11

Conclusions Best model: Uses information in glimpses and counterevidence in the masked regions Glimpses constrained to a minimum area Treats all regions with local SNR > -5 db as potential glimpses Conclusions A higher glimpse threshold (e.g. local SNR > db) produces fewer glimpses, but this provides less distorted information than a lower threshold (e.g. -5 db). Conclusions Limitation: local SNR must be known in advance. Is there a way to estimate the local SNR directly from the mixture? Tracking problem: how to integrate glimpses over time? 2-talker correct responses (%) Brungart et al. (21) Different Modulated talker, talker, Same different talker same noise sex 12 9 6 3 3 6 9 12 Target-to-Masker Ratio (db) Brungart et al. (21) 2-talker correct responses (%) 12 9 6 3 3 6 9 12 Target-to-Masker Ratio (db) 12