Speech recognition in noisy environments: A survey

Similar documents
Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

Codebook driven short-term predictor parameter estimation for speech enhancement

Lecture 9: Speech Recognition: Front Ends

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist

Problems and solutions for noisy speech recognition

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Heart Murmur Recognition Based on Hidden Markov Model

Noise-Robust Speech Recognition Technologies in Mobile Environments

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

Acoustic Signal Processing Based on Deep Neural Networks

Lecturer: T. J. Hazen. Handling variability in acoustic conditions. Computing and applying confidence scores

CHAPTER 1 INTRODUCTION

Research Article Automatic Speaker Recognition for Mobile Forensic Applications

HCS 7367 Speech Perception

Speech Enhancement Using Deep Neural Network

ReSound NoiseTracker II

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar,

EARLY STAGE DIAGNOSIS OF LUNG CANCER USING CT-SCAN IMAGES BASED ON CELLULAR LEARNING AUTOMATE

Robust Speech Detection for Noisy Environments

Speech Enhancement Based on Deep Neural Networks

Voice Detection using Speech Energy Maximization and Silence Feature Normalization

Robust Neural Encoding of Speech in Human Auditory Cortex

FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION

Sensory Cue Integration

Some Studies on of Raaga Emotions of Singers Using Gaussian Mixture Model

Tandem modeling investigations

Sound, Mixtures, and Learning

A Review on Dysarthria speech disorder

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phase Components

Neuroinformatics. Ilmari Kurki, Urs Köster, Jukka Perkiö, (Shohei Shimizu) Interdisciplinary and interdepartmental

REVIEW ON ARRHYTHMIA DETECTION USING SIGNAL PROCESSING

Modulation and Top-Down Processing in Audition

Robustness, Separation & Pitch

Auditory-based Automatic Speech Recognition

Performance Comparison of Speech Enhancement Algorithms Using Different Parameters

International Journal of Innovative Research in Advanced Engineering (IJIRAE) Volume 1 Issue 10 (November 2014)

PHM for Rail Maintenance Management: at the crossroads between data science and physics

Michael Hallquist, Thomas M. Olino, Paul A. Pilkonis University of Pittsburgh

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

A bio-inspired feature extraction for robust speech recognition

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING

Analysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System

Mammogram Analysis: Tumor Classification

Reading Assignments: Lecture 18: Visual Pre-Processing. Chapters TMB Brain Theory and Artificial Intelligence

Speaker Independent Isolated Word Speech to Text Conversion Using Auto Spectral Subtraction for Punjabi Language

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Enhanced Feature Extraction for Speech Detection in Media Audio

Adaptation of Classification Model for Improving Speech Intelligibility in Noise

Reach and grasp by people with tetraplegia using a neurally controlled robotic arm

HCS 7367 Speech Perception

EMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING

Novel Speech Signal Enhancement Techniques for Tamil Speech Recognition using RLS Adaptive Filtering and Dual Tree Complex Wavelet Transform

EECS 433 Statistical Pattern Recognition

Sonic Spotlight. SmartCompress. Advancing compression technology into the future

Audio-Visual Integration in Multimodal Communication

Removing ECG Artifact from the Surface EMG Signal Using Adaptive Subtraction Technique

PMR5406 Redes Neurais e Lógica Fuzzy. Aula 5 Alguns Exemplos

Improved Intelligent Classification Technique Based On Support Vector Machines

Gender Based Emotion Recognition using Speech Signals: A Review

SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf

The Institution of Engineers Australia Communications Conference Sydney, October

An active unpleasantness control system for indoor noise based on auditory masking

Gray level cooccurrence histograms via learning vector quantization

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION

Advanced Methods and Tools for ECG Data Analysis

Challenges in microphone array processing for hearing aids. Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany

Contributions to Brain MRI Processing and Analysis

Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss

Neural Network based Heart Arrhythmia Detection and Classification from ECG Signal

Mitigating the Effects of Non-Stationary Unseen Noises on Language Recognition Performance

Automatic Live Monitoring of Communication Quality for Normal-Hearing and Hearing-Impaired Listeners

A Survey on Hand Gesture Recognition for Indian Sign Language

Recognition & Organization of Speech and Audio

1 Pattern Recognition 2 1

Information Sciences 00 (2013) Lou i Al-Shrouf, Mahmud-Sami Saadawia, Dirk Söffker

Efficient voice activity detection algorithms using long-term speech information

LOCATING BRAIN TUMOUR AND EXTRACTING THE FEATURES FROM MRI IMAGES

SPEECH ENHANCEMENT. Chunjian Li Aalborg University, Denmark

Improving the Noise-Robustness of Mel-Frequency Cepstral Coefficients for Speech Processing

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

10CS664: PATTERN RECOGNITION QUESTION BANK

EEG Signal Description with Spectral-Envelope- Based Speech Recognition Features for Detection of Neonatal Seizures

MORPHOLOGICAL CHARACTERIZATION OF ECG SIGNAL ABNORMALITIES: A NEW APPROACH

International Journal of Scientific & Engineering Research, Volume 5, Issue 3, March ISSN

Automatic audio analysis for content description & indexing

MULTI-MODAL FETAL ECG EXTRACTION USING MULTI-KERNEL GAUSSIAN PROCESSES. Bharathi Surisetti and Richard M. Dansereau

Sound, Mixtures, and Learning

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System

Development of novel algorithm by combining Wavelet based Enhanced Canny edge Detection and Adaptive Filtering Method for Human Emotion Recognition

Mammography is a most effective imaging modality in early breast cancer detection. The radiographs are searched for signs of abnormality by expert

LIST OF FIGURES. Figure No. Title Page No. Fig. l. l Fig. l.2

Mammogram Analysis: Tumor Classification

Transcription:

T-61.182 Robustness in Language and Speech Processing Speech recognition in noisy environments: A survey Yifan Gong presented by Tapani Raiko Feb 20, 2003

About the Paper Article published in Speech Communication in 1995 (31 pages) A survey of 250 publications divided to 3 categories: Noise resistance Speech enhancement Model compensation for noise Focus: Mismatch in training and operating environments

Introduction (1/2) Speech recognition in controlled situations has reached very high levels of performance Performance degrades in noisy situations 100% to 30% accuracy in a car (90km/h) 99% to 50% in a cafeteria Two phenomena: Contaminated speech signal (typically additive, also convolutional) Articulation variablity (called the Lombard effect)

Introduction (2/2) A system trained with a given SNR performs worse in other SNR environments. What to do? Search for noise resistant features and robust distance measures (1. Noise resistance) Reduce the mismatch: Remove noise from the signal (2. Speech enhancement) Transform speech models to accommodate noise (3. Model compensation for noise)

1. Noise Resistance Training cond. Data Operating cond. Data SNR Model Recognition Search for noise resistant features and robust distance measures

2. Speech Enhancement Training cond. Data Operating cond. Data SNR D Transform data Model Remove noise from the signal Recognition

3. Model Compensation for Noise Training cond. Data Operating cond. Data SNR Model M Transform model Recognition Transform speech models to accommodate noise

1. Noise Resistance: Introduction Parameters of a recogniser are very sensitive to disturbances Focus on the effect of noise to the parameters Use derived feature parameters or similarity measurements (that are hopefully invariant to those effects) Weak or no assumptions about the noise Both a strength and a weakness

1. Noise Resistance: Examples (1/2) Normalised cepstral vectors Cepstrum is the Fourier transformation of spectrum White noise corruption reduces the norm of cepstral vectors The angle between ceptral vectors is less affected Spectral weightings methods (WLR, RPS, SWL) Emphasize more spectral peaks than valleys De-emphasize low quefrency terms of the cepstrum Multi-layer perceptron as phoneme classifier Generalises better than e.g. k-nearest neighbour

1. Noise Resistance: Examples (2/2) Computational models of the auditory system for speech Computationally expensive Wavelet transform followed by a compressive nonlinearity Frequency dependent lateral inhibition function Slow variation removal Many noises vary slowly compared to speech CMN: Remove the mean from cepstral vectors Use time derivatives of cepstra RASTA, RASTA-PLP, J-RASTA, J-RASTA-PLP (J handles also convolutional noise)

2. Speech Enhancement: Introduction A preprocessing step Developed for speech quality improvement Criteria are usually not related to recognition accuracy A priori information about the speech and the noise Enhanced SNR does not always improve recognition performance (the case with basic Wiener or Kalman filtering)

2. Speech Enhancement: Examples (1/2) Parameter mapping Data: speech with and without noise Teach a neural network to map noisy vectors to clean vectors Highly dependent on training data Spectral subtraction Estimate noise spectrum during non-speech periods Subtract noise from the power spectrum A special case of a Wiener filter Noise masking is related (=ignore everything under a power threshold)

2. Speech Enhancement: Examples (2/2) Comb filtering Estimate the period of the speech Use only the corresponding frequency and its multiples Bayesian estimation A generative model for latent true speech with noise Estimate the posterior of the speech as the enhanced signal Equivalent to template-based estimation (formulated without probabilities)

3. Model Compensation for Noise: Introduction Accept the precence of noise Hidden Markov Models (HMMs) as the framework Model parameteres are optimised during operation At very low SNRs problematic (at least in 1994)

3. Model Compensation for Noise: Examples (1/2) Decomposition of HMMs (PMC, STM) N M state HMM, N states for speech and M states for noise Parts are trained separately Assumes Gaussian distributions when actually at low SNR some are bimodal State-dependent Wiener filtering Wiener filter uses the ratio of power spectrum of clean speech over the noisy speech Power spectrum of speech is very non-stationary Idea: HMMs automatically divide speech into quasi-stationary segments!

3. Model Compensation for Noise: Examples (2/2) Duration models Duration structures of speech are less affected by noise Adaptation of HMMs Train an HMM with lots of clean speech data Use just a small amount of noisy speech to find a mapping of HMM parameters to the noisy environment Discriminative HMMs Instead of maximum likelihood use maximum classification accuracy

Training Data Contamination Does not fit any of the three categories E.g. mix cafeteria recording to clean speech data Used as a benchmark Sensitive to noise level and type Cannot cope with the Lombard effect At equivalent SNRs, Gaussian noise is worst a lower bound

Conclusion (1/2) Focus: Mismatch between training and operating conditions Are properties of the noise known? Is computing power cheap? No: Use 1. (feature-similarity-based) Yes: Use 2. or 3. (transformation-based) Different techniques may be combined Non-stationary noise is a hot topic (1994)

Conclusion (2/2): Key Issues Accurate speech and noise models (state decomposition) Incorporating a dynamical model (HMM) Incorporating frequency correlations (LPC, SOM,...) Weighting portions of speech based on their SNR Class dependent processing (class {word, phoneme, sound class, HMM state, VQ codebook vector}) Optimisation criteria (discriminative training) Human auditory system as inspiration