Adaptation of Classification Model for Improving Speech Intelligibility in Noise

Similar documents
Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

Using Source Models in Speech Separation

GENERALIZATION OF SUPERVISED LEARNING FOR BINARY MASK ESTIMATION

Voice Detection using Speech Energy Maximization and Silence Feature Normalization

Performance Comparison of Speech Enhancement Algorithms Using Different Parameters

Acoustic Signal Processing Based on Deep Neural Networks

Using Speech Models for Separation

Fig. 1 High level block diagram of the binary mask algorithm.[1]

CHAPTER 1 INTRODUCTION

Robust Neural Encoding of Speech in Human Auditory Cortex

Speech Enhancement Using Deep Neural Network

Speech Enhancement Based on Deep Neural Networks

SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf

International Journal of Scientific & Engineering Research, Volume 5, Issue 3, March ISSN

Speech Enhancement, Human Auditory system, Digital hearing aid, Noise maskers, Tone maskers, Signal to Noise ratio, Mean Square Error

IMPROVING CHANNEL SELECTION OF SOUND CODING ALGORITHMS IN COCHLEAR IMPLANTS. Hussnain Ali, Feng Hong, John H. L. Hansen, and Emily Tobey

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

An active unpleasantness control system for indoor noise based on auditory masking

Gender Based Emotion Recognition using Speech Signals: A Review

COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION

Research Article Automatic Speaker Recognition for Mobile Forensic Applications

Using the Soundtrack to Classify Videos

Modulation and Top-Down Processing in Audition

Hybrid Masking Algorithm for Universal Hearing Aid System

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

HybridMaskingAlgorithmforUniversalHearingAidSystem. Hybrid Masking Algorithm for Universal Hearing Aid System

Automatic Live Monitoring of Communication Quality for Normal-Hearing and Hearing-Impaired Listeners

Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phase Components

Computational Perception /785. Auditory Scene Analysis

Real-Time Implementation of Voice Activity Detector on ARM Embedded Processor of Smartphones

Using visual speech information and perceptually motivated loss functions for binary mask estimation

Investigating the effects of noise-estimation errors in simulated cochlear implant speech intelligibility

Masking release and the contribution of obstruent consonants on speech recognition in noise by cochlear implant users

Impact of Sound Insulation in a Combine Cabin

Investigating the effects of noise-estimation errors in simulated cochlear implant speech intelligibility

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPLANTS USING KALMAN WITH DRNL AND SSB TECHNIQUE

Analysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

HCS 7367 Speech Perception

Noise-Robust Speech Recognition Technologies in Mobile Environments

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

Modern Speech Enhancement Techniques in Text-Independent Speaker Verification

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

Acoustic feedback suppression in binaural hearing aids

Acoustic Sensing With Artificial Intelligence

Codebook driven short-term predictor parameter estimation for speech enhancement

Echo Canceller with Noise Reduction Provides Comfortable Hands-free Telecommunication in Noisy Environments

Challenges in microphone array processing for hearing aids. Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany

ADVANCES in NATURAL and APPLIED SCIENCES

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication

Voice Pitch Control Using a Two-Dimensional Tactile Display

Speech enhancement based on harmonic estimation combined with MMSE to improve speech intelligibility for cochlear implant recipients

Individualizing Noise Reduction in Hearing Aids

Near-End Perception Enhancement using Dominant Frequency Extraction

Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone

Benefit of Advanced Directional Microphones for Bilateral Cochlear Implant Users

Some Studies on of Raaga Emotions of Singers Using Gaussian Mixture Model

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION

EECS 433 Statistical Pattern Recognition

LabROSA Research Overview

SRIRAM GANAPATHY. Indian Institute of Science, Phone: +91-(80) Bangalore, India, Fax: +91-(80)

Perceptual Effects of Nasal Cue Modification

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

Recognition & Organization of Speech and Audio

ADVANCES in NATURAL and APPLIED SCIENCES

Speech recognition in noisy environments: A survey

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

NONLINEAR REGRESSION I

PCA Enhanced Kalman Filter for ECG Denoising

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

An Auditory System Modeling in Sound Source Localization

Recognition & Organization of Speech & Audio

Sonic Spotlight. SmartCompress. Advancing compression technology into the future

Recognition & Organization of Speech and Audio

Sensory Cue Integration

Complete Cochlear Coverage WITH MED-EL S DEEP INSERTION ELECTRODE

Roger TM. Learning without limits. SoundField for education

Robust Speech Detection for Noisy Environments

The Unifi-EV Protocol for Evalita 2009

Topics in Amplification DYNAMIC NOISE MANAGEMENT TM A WINNING TEAM

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

HCS 7367 Speech Perception

EEL 6586, Project - Hearing Aids algorithms

A Single-Channel Noise Reduction Algorithm for Cochlear-Implant Users

Comparative Analysis of Vocal Characteristics in Speakers with Depression and High-Risk Suicide

Sound, Mixtures, and Learning

Binaural Hearing. Why two ears? Definitions

Speech intelligibility in background noise with ideal binary time-frequency masking

Yonghee Oh, Ph.D. Department of Speech, Language, and Hearing Sciences University of Florida 1225 Center Drive, Room 2128 Phone: (352)

MULTI-MODAL FETAL ECG EXTRACTION USING MULTI-KERNEL GAUSSIAN PROCESSES. Bharathi Surisetti and Richard M. Dansereau

Proceedings of Meetings on Acoustics

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information

Transcription:

1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise) (Regular Paper) 23 4, 2018 7 (JBE Vol. 23, No. 4, July 2018) https://doi.org/10.5909/jbe.2018.23.4.511 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) a), a) Adaptation of Classification Model for Improving Speech Intelligibility in Noise Junyoung Jung a) and Gibak Kim a) - 0 1. - 0, 1..... Abstract This paper deals with improving speech intelligibility by applying binary mask to time-frequency units of speech in noise. The binary mask is set to 0 or 1 according to whether speech is dominant or noise is dominant by comparing signal-to-noise ratio with pre-defined threshold. Bayesian classifier trained with Gaussian mixture model is used to estimate the binary mask of each time-frequency signal. The binary mask based noise suppressor improves speech intelligibility only in noise condition which is included in the training data. In this paper, speaker adaptation techniques for speech recognition are applied to adapt the Gaussian mixture model to a new noise environment. Experiments with noise-corrupted speech are conducted to demonstrate the improvement of speech intelligibility by employing adaption techniques in a new noise environment. Keyword : speech intelligibility, noise suppression, binary mask, Gaussian mixture model, adaptation a) (School of Electrical Engineering, Soongsil University) Corresponding Author : (Gibak Kim) E-mail: imkgb27@ssu.ac.kr Tel: +82-2-828-7266 ORCID:https://orcid.org/0000-0001-5114-4117 No.10048474,. This material is based upon work supported by the Ministry of Trade, Industry & Energy(MOTIE, Korea) under Industrial Technology Innovation Program. No.10048474, 'Development of a Service Robot based on Big Data for Providing Aging Generation with Personalized Welfare Services' Manuscript received March 16, 2018; Revised May 8, 2018; Accepted May 8, 2018. Copyright 2016 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.

(JBE Vol. 23, No. 4, July 2018)......... 5 db SNR (Signal-to-Noise Ratio) 100%. 5~15 db SNR (5~0 db SNR). [1-3]. 2000, CASA (Computational Auditory Scene Analysis) [4-6]. - - 0, 1. 0 1 0dB. -., -, [7-11]... (incremental training) [12]. (Gaussian Mixutre Model: GMM) (quasi-bayes),,,.... eigenvoice [13]. Eigenvoice

1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise). PCA (Principal Component Analysis) eigenvoice. eigenvoice. eigenvoice [14]. Eigenvoice.,.. eigenvoice MAP (Maximum a posterior) MLLR (Maximum Likelihood Linear Regression) [15,16]. (full covariance matrix) MLLR MAP, MLLR MAP.. 2, 3 MAP, MLLR. 4... - - 1. Fig. 1. Extraction of amplitude modulation spectrogram. Tchorz Kollmeier (AMS: Amplitude Modulation Spectrogram) [17,18] 1., 4ms Hann ( Hamming). 4ms 48 (12kHz ), 80 128, 128 FFT (Fast Fourier Transform). FFT (power spectrum). 0.25ms Hann FFT, 4kHz (1/0.25ms)., mel 25

(JBE Vol. 23, No. 4, July 2018). 25 mel 4kHz. 32ms (128) Hann, 16ms. Hann 128 128 256, 256 FFT.,. 15 15. 15 25 45. 0 1 (Bayesian). 0 1 ( ), - (AMS) 0 1 (posterior probability),.,. if otherwise o,. GMM... 1. MAP (Maximum a posteriori) MAP. ML (Maximum Likelihood). ML likelihood. MAP, (posterior distribution)., ML.,,.. [16].,,. ML. ML. ().

1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise). w i n w i n G i l i w i n W, l i L. 1/2 G i. 2. MLLR (Maximum Likelihood Linear Regression) M G i m c m m m T mi MLLR MAP (linear regression) [15]. m. W W EM (Expectation Maximization) ML W. [7]. MLLR Povey Saon (iterative) W [15]., (6). m m m (covariance matrix). W (gradient) W m W. 1.. IEEE [19], 25kHz, 12kHz. 16kHz 12kHz. 64. 34, babble, factory, speech-shaped. Babble 10, factory NOISEX [20]. NOISEX 20kHz 12kHz. Speech-shaped IEEE stationary. IEEE W m W

(JBE Vol. 23, No. 4, July 2018) 10 2~3. 200, 5, 0, 5 db.. 5dB 10 10 50. 2. 1 MAP, MLLR. III MAP MLLR. MAP MLLR MLLR MAP.,... (Hit) (False Alarm:FA) (Hit-FA). Hit 1 1, FA 0 1. Hit-FA Hit-FA [7]. 2. MAP, MLLR MAP+MLLR, MLLR+MAP. eigenvoice(k=5). 2. ( - : Hit - FA, NA: No Adaptation) Fig. 2. Performance of model adaptation in new noises 20% Hit-FA. [7], 40%., MLLR MAP, MLLR MAP.

1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise).,., eigenvoice MAP, MLLR.. (References) [1] Y. Hu and P. Loizou, Subjective comparison and evaluation of speech enhancement algorithms, Speech communication, vol. 49, no. 7, pp. 588601, Jul. 2007. https://doi.org/10.1016/j.specom.2006.12.006 [2] Y. Hu and P. Loizou, Evaluation of objective quality measures for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 16, no. 1, pp. 229238, 2008. https://doi.org/10.1109/tasl.2007.911054 [3] Y. Hu and P. Loizou, A comparative intelligibility study of single-microphone noise reduction algorithms. The Journal of the Acoustical Society of America, vol. 122, no. 3, p. 1777, Sep. 2007. https://doi.org/10.1121/1.2766778 [4] G. Brown and M. Cooke, Computational auditory scene analysis, Computer speech and language, vol. 8, pp. 297336, 1994. https://doi.org/10.1006/csla.1994.1016 [5] D. Wang and G. Brown, Computational Auditory Scene Analysis : Principles, Algorithms, and Applications, Wiley, Hoboken, NJ, 2006. https://doi.org/10.1109/tnn.2007.913988 [6] D. Wang, On ideal binary mask as the computational goal of auditory scene analysis, In Divenyi P. (ed.), Speech Separation by Humans and Machines, pp. 181-197, Kluwer Academic, Norwell MA, 2005. https://doi.org/10.1007/0-387-22794-6_12 [7] G. Kim. Y. Lu, Y. Hu and P. Loizou, An algorithm that improves speech intelligibility in noise for normal-hearing listeners, The Journal of the Acoustical Society of America, vol. 126, no. 3, pp 1486-1494, 2009. https://doi.org/10.1121/1.3184603 [8] Y. Hu, P. Loizou, Environment-specific noise suppression for improved speech intelligibility by cochlear implant users, The Journal of the Acoustical Society of America, vol. 127, no. 6, pp 3689-3695, 2010. https://doi.org/10.1121/1.3365256 [9] K. Han, D. Wang, A classification based approach to speech segregation, The Journal of the Acoustical Society of America, vol. 132, no. 5, pp 3475-3483, 2012. https://doi.org/10.1121/1.4754541 [10] Y. Wang, K. Han, D. Wang, Exploring monaural features for classification-based speech segregation, IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 2, pp 270-279, 2013. https://doi.org/10.1109/tasl.2012.2221459 [11] G. Kim, A Post-processing for Binary Mask Estimation Toward Improving Speech Intelligibility in Noise, Journal of Broadcast Engineering, Vol. 18, No.2, pp.311-318, March, 2013. https://doi.org/10.5909/jbe.2013.18.2.311 [12] G. Kim, P. Loizou, Improving speech intelligibility in noise using environment-optimized algorithms, IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 8, pp 2080-2090, 2010. https://doi.org/10.1109/tasl.2010.2041116 [13] G. Kim, Eigenvoice Adaptation of Classification Model for Binary Mask Estimation, Journal of Broadcast Engineering, Vol.20, No.1, pp.164-170, 2015. https://doi.org/10.5909/jbe.2015.20.1.164 [14] R. Kuhn, J. Junqua, P. Nguyen, N. Niedzielski, "Rapid Speaker Adaptation in Eigenvoice Space," IEEE Transactions on Speech and Audio Proceeding, vol. 8, no. 6, pp. 695-707, November 2000. https://doi.org/10.1109/89.876308 [15] D. Povey and G. Saon, Feature and model space speaker adaptation with full covariance Gaussians, Proceedings of Interspeech, USA, september 2006. [16] K. Shinoda, Speaker Adaptation Techniques for Speech Recognition Using Probabilistic Models, Electronics and Communications in Japan, Part 3, Vol. 88, No. 12, pp.25-42, 2005. https://doi.org/10.1002/ecjc.20207 [17] J. Tchorz and B. Kollmeier, Estimation of the signal-to-noise ratio with amplitude modulation spectrograms, Speech Communication, vol. 38, no. 12, pp. 117, Sep. 2002. https://doi.org/10.1016/s0167-6393(01)00040-1 [18] J. Tchorz and B. Kollmeier, SNR estimation based on amplitude modulation analysis with applications to noise suppression, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 3, pp. 184 192, May 2003. https://doi.org/10.1109/tsa.2003.811542 [19] IEEE, IEEE recommended practice for speech quality measurements", IEEE Trans. Audio Electroacoust., vol. 17, pp. 225-246, 1969. https://doi.org/10.1109/tau.1969.1162058 [20] A. Varga and H. J. M. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Communication, vol. 12, pp. 247-251, 1993. https://doi.org/10.1016/0167-6393(93)90095-3

(JBE Vol. 23, No. 4, July 2018) - 2017 : - 2017 ~ : - ORCID : https://orcid.org/0000-0002-8327-7181 - :,, - 1994 : - 1996 : - 2007 : - 1996 ~ 2000 : LG - 2000 ~ 2003 : () - 2008 ~ 2010 : Univ. of Texas at Dallas, Research Associate - 2011 ~ : - ORCID : https://orcid.org/0000-0001-5114-4117 - :,,