Auditory principles in speech processing do computers need silicon ears?

Similar documents
HCS 7367 Speech Perception

Binaural Hearing. Why two ears? Definitions

HCS 7367 Speech Perception

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

Psychoacoustical Models WS 2016/17

Hearing4all with two ears: Benefit of binaural signal processing for users of hearing aids and cochlear implants

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

Binaural processing of complex stimuli

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Sound localization psychophysics

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

CHAPTER 1 INTRODUCTION

Automatic Live Monitoring of Communication Quality for Normal-Hearing and Hearing-Impaired Listeners

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

Robust Neural Encoding of Speech in Human Auditory Cortex

Predicting the benefit of binaural cue preservation in bilateral directional processing schemes for listeners with impaired hearing

Using Source Models in Speech Separation

Role of F0 differences in source segregation

ClaroTM Digital Perception ProcessingTM

Computational Perception /785. Auditory Scene Analysis

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

9/29/14. Amanda M. Lauer, Dept. of Otolaryngology- HNS. From Signal Detection Theory and Psychophysics, Green & Swets (1966)

BINAURAL DICHOTIC PRESENTATION FOR MODERATE BILATERAL SENSORINEURAL HEARING-IMPAIRED

Combination of binaural and harmonic. masking release effects in the detection of a. single component in complex tones

Influence of acoustic complexity on spatial release from masking and lateralization

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

Sound, Mixtures, and Learning

The development of a modified spectral ripple test

Infant Hearing Development: Translating Research Findings into Clinical Practice. Auditory Development. Overview

J Jeffress model, 3, 66ff

Hearing II Perceptual Aspects

Signals, systems, acoustics and the ear. Week 5. The peripheral auditory system: The ear as a signal processor

Issues faced by people with a Sensorineural Hearing Loss

Publication VI. c 2007 Audio Engineering Society. Reprinted with permission.

A. SEK, E. SKRODZKA, E. OZIMEK and A. WICHER

Modulation and Top-Down Processing in Audition

Characterizing individual hearing loss using narrow-band loudness compensation

Individual factors in speech recognition with binaural multi-microphone noise reduction: Measurement and prediction

Recognition & Organization of Speech & Audio

Chapter 40 Effects of Peripheral Tuning on the Auditory Nerve s Representation of Speech Envelope and Temporal Fine Structure Cues

Time Varying Comb Filters to Reduce Spectral and Temporal Masking in Sensorineural Hearing Impairment

Challenges in microphone array processing for hearing aids. Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Binaural hearing and future hearing-aids technology

Lateralized speech perception in normal-hearing and hearing-impaired listeners and its relationship to temporal processing

Acoustics Research Institute

BASIC NOTIONS OF HEARING AND

Extending a computational model of auditory processing towards speech intelligibility prediction

Speech intelligibility in simulated acoustic conditions for normal hearing and hearing-impaired listeners

Modeling individual loudness perception in cochlear implant recipients with normal contralateral hearing

21/01/2013. Binaural Phenomena. Aim. To understand binaural hearing Objectives. Understand the cues used to determine the location of a sound source

Robustness, Separation & Pitch

Proceedings of Meetings on Acoustics

3-D Sound and Spatial Audio. What do these terms mean?

Representation of sound in the auditory nerve

An Auditory System Modeling in Sound Source Localization

Charactering the individual ear by the Auditory Profile

Auditory Scene Analysis: phenomena, theories and computational models

What Is the Difference between db HL and db SPL?

Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences

2/16/2012. Fitting Current Amplification Technology on Infants and Children. Preselection Issues & Procedures

UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication

Who are cochlear implants for?

Linguistic Phonetics Fall 2005

Loudness Processing of Time-Varying Sounds: Recent advances in psychophysics and challenges for future research

Perceptual Evaluation of Signal Enhancement Strategies

Hearing. Juan P Bello

Adaptation of Classification Model for Improving Speech Intelligibility in Noise

Research Article The Acoustic and Peceptual Effects of Series and Parallel Processing

EEL 6586, Project - Hearing Aids algorithms

Audiogram+: The ReSound Proprietary Fitting Algorithm

Literature Overview - Digital Hearing Aids and Group Delay - HADF, June 2017, P. Derleth

HearIntelligence by HANSATON. Intelligent hearing means natural hearing.

New measurement methods for signal adaptive hearing aids

Possibilities for the evaluation of the benefit of binaural algorithms using a binaural audio link

J. Acoust. Soc. Am. 116 (2), August /2004/116(2)/1057/9/$ Acoustical Society of America

Spectro-temporal response fields in the inferior colliculus of awake monkey

Auditory Perception: Sense of Sound /785 Spring 2017

FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Release from informational masking in a monaural competingspeech task with vocoded copies of the maskers presented contralaterally

Systems Neuroscience Oct. 16, Auditory system. http:

Effects of Age and Hearing Loss on the Processing of Auditory Temporal Fine Structure

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches

Sonic Spotlight. SmartCompress. Advancing compression technology into the future

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Auditory nerve. Amanda M. Lauer, Ph.D. Dept. of Otolaryngology-HNS

Group Delay or Processing Delay

Auditory System & Hearing

LIST OF FIGURES. Figure No. Title Page No. Fig. l. l Fig. l.2

Acoustic Sensing With Artificial Intelligence

Binaural Hearing. Steve Colburn Boston University

White Paper: micon Directivity and Directional Speech Enhancement

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

Slow compression for people with severe to profound hearing loss

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Masker-signal relationships and sound level

Transcription:

* with contributions by V. Hohmann, M. Kleinschmidt, T. Brand, J. Nix, R. Beutelmann, and more members of our medical physics group Prof. Dr. rer.. nat. Dr. med. Birger Kollmeier* Auditory principles in speech processing do computers need silicon ears?

A quick test of your Cocktailparty-processor... Can you understand this sentence? (Three repetitions, increasing signal-to-noise ratio) Rachel has seven pretty spoons Computer, poor conditions -10 db -5dB 0dB 5 db 10 db SNR Humans, optimum conditions Humans, poor conditions Humans, hearing impaired Computer, optimum condition

Outline! Auditory principles already in silico! Additional properties not yet exploited! Auditory models! Modulation processing! Binaural information processing!...why it matters not only for hearing aids

Auditory properties used in speech processing systems! Logarithmic/compressive intensity units (db, loudness, cepstrum)! Quasi-logarithmic Mel/bark-scale as frequency scale! Linear predictive coding with quantization noise masked by speech (Schroeder & Atal)! Masking model for HiFi-coding (MPEG, Brandenburg)! Speech coding & Audio quality objective assessment (Beerends & Stemerdink)! RASTA techniques for ASR (Hermansky& Morgan)

! Classical approach: Loudness model (Fletcher, Stevens, Zwicker & Fastl, Moore)! Forward & backward masking produce temporal sluggishness Spectral Masking models Level-dependent slopes dent slopes Envelope/ compression Excitation Pattern Pattern = Maximum Across Across channels

Application of auditory models! Assessment of signal quality (Cellular phone networks,...)! Signal coding (MP3, MiniDisc,..) Coded Signal Original- Signal Signal Model Coder Model Quality Measure Decoder! Speech & pattern recognition! Hearing aids Signal Model Recognizer Signal Modell (NH) Modell (HI) Modifications

What do we miss when applying classical models?! Temporal processing! Modulation processing! Spectro-temporal modulation processing! Binaural/spatial processing! Cognitive effects (interpolation, suppression)

Speech perception without a spectrum? Flat spectrum (phase only) speech using the Oldenburg sentence test Ear performs temporal analysis

Temporal processing used for robust speech recognition! Hermansky & Sharma: TRAPS! Perception model Signal Model Recognizer Tchorz & Kollmeier, JASA 106,1999 Auditory front end for speech recognizer : Robustness against noise

What do we miss when applying classical models?! Temporal processing! Modulation processing! Spectro-temporal modulation processing! Binaural/spatial processing! Cognitive effects (interpolation, suppression) Sinusoidally Amplitude modulated noise Spoken sentence

Asymmetry in masking: Which is the better masker? Tone 2 khz Narrow-band noise 2 khz, 256 Hz bandwith Count the audible steps! Tone in steps masked by continuous noise Noise in steps masked by continuous tone f f Ear performs detailed envelope analysis (modulation spectrum)

What do we miss when applying classical models?! Temporal processing! Modulation processing! Spectro-temporal modulation processing! Binaural/spatial processing! Cognitive effects Gabor spectro-temporal feature time frequency

Continuity illusion: Can we trust our ears? Music + pauses + noise (+6dB) Do you hear ongoing music? Music & pauses (500/125 ms) top-down processing by our brain

Visual analogy Bregman s Bs

What do we miss when applying classical models?! Temporal processing! Modulation processing! Spectro-temporal modulation processing! Binaural/spatial processing! Cognitive effects (interpolation, suppression) How can we quantify these effects and put them to work in speech processing?

Model framework 11 Sound input 11 22 Envelope/ Compression 44 22 Internal Noise Noise Envelope/ Compression Internal representation Modulation- Binaural Noise Noise Reduction Modulation- 33 Central Central optimal pattern pattern recognizer/ processor Model of the effective processing in the auditory system & Impairments 1 Dau, Kollmeier& Kohlrausch, 1997, Zerbs, 2000. Kollmeier, 2000, Derleth et al., 2001 44

Approach: Analysis by model & simulation system Human listener Acoustical scene Filter bank Envelope/ Envelope/ Compression Compression Internal Internal Noise Noise Filter bank Envelope/ Envelope/ Compression Compression ModulationModulation Binaural Binaural Noise Noise Reduction Reduction Central Central optimal optimal pattern pattern recognizer/ recognizer/ processor processor ModulationModulation Test result Predicted Test result Auditory model Hearing Aid/ computer System performance

Working hypothesis Auditory System exploits all available acoustical cues, employing! Lossy Front End (quantified by auditory model with information compression): acoustical input internal representation! Perfect Back End: central pattern recognition (limited only by internal noise ) Technical copy of auditory front end yields near optimum performance of technical system

Outline! Auditory principles already in silico! Additional properties not yet exploited! Auditory models! Modulation processing! Binaural information processing!...why it matters not only for hearing aids

Model framework: Modulation frequency analysis Envelope/ Compression Modulation- Modulation- Internal Noise Noise Binaural Noise Noise Reduction Central Central optimal pattern pattern recognizer/ processor Envelope/ Compression Model of the effective processing in the auditory system Dau, Kollmeier& Kohlrausch, 1997, Zerbs, 2000. Kollmeier, 2000, Derleth et al., 2001

Modulation frequency selectivity! Modulation-map in the auditory system (Langer & Schreiner)! Psychoacoustics: Modulationfilterbank (Dissertation Dau, Dau et al., JASA 1997, Dissertations Verhey, Derleth, Ewert)! Signal processing, noise reduction (Kollmeier & Koch, JASA 1994)! Advantage: Separation of different auditory objects covering the same frequency region

Examples of Amplitude Modulation Spectrogram (AMS) (voiced) speech speech simulation noise Speech shows joint distribution in frequency/modulation frequency domain

SNR Estimation from Modulation spectrograms Signal Model Recognizer Analysis frame AMS Pattern Measured S/N in Analysis frame During training only Neural Network Output Activities = Estimate of SNR! Speech-to-Noise Ratio estimate either broadband or multiple narrowband

Importance of two-dimensional features in modulation spectrogram! Estimation error based on Full 2-dim distribution modulation spectrum bark spectrum combination of both + 5.2 db 6.6 db 7.6 db 5.8 db Joint distribution of modulations and spectrum required!

Monaural Noise reduction using Amplitude Modulation cues Suppression of a fluctuating background noise using AMS Industrial noise with speech unprocessed processed Improves speech recognizer in noise (Tchorz & Kollmeier, 2002) Modulation frequency analysis is important for speech perception & promising for speech processing

Spectrotemporal features from neurophysiology Receptive fields of cortical neurons DeCharms et al. (1998) Indications for spectro-temporal feature extraction Depireux et al. (2000) Model of modulation perception Chi et al. (1999)

Gabor Tandem recognizer More details: M. Kleinschmidt: 'Localized Spectro-temporal Features for ASR' Eurospeech Session SThBb - Time is of the Essence, Thursday 10am, Room 2

Aurora: relative reduction in word error rate (WER) 60 58,3 relative reduction in WER 58 56 54 52 50 48 46 44 51,7 50,4 46,7 53,9 55,8 42 40 ew ETSI standard Kleinschmidt & Gelbart ICSLP 2002 ICSI/OGI ICSI/OGI + Gabor Tidigits speechdat-car

Outline! Auditory principles already in silico! Additional properties not yet exploited! Auditory models! Modulation processing! Binaural information processing!...why it matters not only for hearing aids

Model framework: Binaural noise reduction Envelope/ Compression Modulation- Modulation- Internal Noise Noise Binaural Noise Noise Reduction Central Central optimal pattern pattern recognizer/ processor Envelope/ Compression Model of the effective processing in the auditory system

Speech Reception Threshold for different spatial arrangements Binaural advantage Peissig & Kollmeier, 1997 Target speech from front Second, fixed jammer Azimuth of jammer Jammer from different azimuth! One continuous jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously Binaural hearing operates like 2-sensor adaptive beamformer

v.hövel-model ( 84) model Binaural Input F1 F1 F2 F2 F3 F3 F4 F4 w ECcircuit ECcircuit ECcircuit τ ECcircuit S/N estimate S/N estimate S/N estimate S/N estimate Articula tion Index Output: Intelligibility estimate............ Fn Fn ECcircuit S/N estimate Noise reduction Zur Bedeutung der Übertragungseigenschaften des Außenohres sowie des binauralen Hörsystems bei gestörter Sprachübertragung, Dissertation, RWTH Aachen

v.hövel-model ( 84) model: modifications Binaural Input F1 F1 F2 F2 F3 F3 F4 F4...... Fn Fn Gammatone w τ... ECcircuit ECcircuit ECcircuit ECcircuit ECcircuit S/N estimate S/N estimate S/N estimate S/N estimate... S/N estimate Numerical optimization of delay& gains Articula tion Index Output: Intelligibility estimate Speech Intelligibility Index (SII)

Speech Reception Threshold for different spatial arrangements Target speech from front Jammer from different azimuth Predictions by binaural model (R. Beutelmann, T.Brand) Peissig & Kollmeier, 1997 Second, fixed jammer Azimuth of jammer

Corresponding binaural Beamformer hearing aid Cocktail-Party processor Control algorithm Adaptation Activation

Performance of two-input Cocktail party processors Blind Sound Source Separation (Anemüller&Kollmeier, 2002)!! Mixture of two sources in a room Separation of first source second source Binaural situation-adaptive directional filter (Wittkop, 2000)!! One speaker in stationary noise One speaker from the front!! + 3 interfering speakers + Algorithmus Localization model-driven beamformer (Nix & Hohmann, 2002)!!! 2 Sources, unprocessed 2 Sources, processed, first direction // second direction 3 Sources, unprocessed//processed No convincing separation of more than two sources

Particle filter to estimate best beamformer online Particle Initialization Transition Matrix State Prediction Get Expectation Values Codebook Resample Particles HRTF Database Calculate Hypothetical Spectra Compare to Observation Binaural Speech Signals Convolution with HRTF/AOI Feature Extraction Details: Johannes Nix, M. Kleinschmidt, V. Hohmann: 'CASA by Using Statistics of High-Dimensional Speech Dynamics and Sound Source Direction Eurospeech 2003 Session PTuDe - Speech Enhancement II, Tuesday 4pm, Main Hall, Level-1

Results: On-line Recovery of Spectral Envelopes original voices left/right ear signal estimated voices (left side) (see: Nix, Kleinschmidt, Hohmann, Tuesday 4pm) Separation from multiple sources with much computation & cognitive complexity!

Conclusions Test result Predicted Test result System performance! Auditory principles for speech processing look promising! Interaction experiment-modelapplication! Amplitude Modulation Spectrogram!! optimally switched two-sensor beamformer to mimic binaural system! Top-down vs. bottom-up processing yet to be explored

Hearing aid or personal communication device? K-WON JABRA HörTech Prototype In-the-ear hearing aid Technology for hearing aids and mobile phones converge Knowledge from hearing aid design is required for modern speech communication systems! Behind-the-ear hearing aid

...thank you! A binaural hearing aid to sit in The ultimate way of achieving a good performance in cocktail parties! Auditory throne in front of the new House of hearing (Oldenburg)