Tandem acoustic modeling: Neural nets for mainstream ASR?

Size: px
Start display at page:

Download "Tandem acoustic modeling: Neural nets for mainstream ASR?"

Transcription

1 Tandem acoutic modeling: for maintream ASR? Dan Elli International Computer Science Intitute Berkeley CA Outline 2 3 Tandem acoutic modeling Inide Tandem ytem: What going on? Future direction Tandem modeling - Dan Elli

2 Tandem acoutic modeling ETSI Aurora noiy digit evaluation - new feature (for ditributed peech recognition) - Gauian mixture HTK back-end provided How to ue hybrid-connectionit trick? (multitream poterior combination etc.) Ue poterior output a feature for HTK... = Tandem connection of two large tatitical model: Neural Net (NN) and Gauian Mixture (GMM) Tandem modeling - Dan Elli

3 C dcl C C dcl dcl C dcl The Tandem tructure Hybrid Connectionit-HMM ASR Conventional ASR (HTK) Noway decoder Gau mix model HTK decoder ah t ah t Speech feature Phone probabilitie Speech feature Subword likelihood Tandem modeling Gau mix model HTK decoder ah t Speech feature Phone probabilitie Subword likelihood Refined ytem Pre-nonlinearity output + PCA orthog'n Gau mix model HTK decoder ah t Alternate feature Subword likelihood Better reult when poterior are made more Gauian Tandem allow poterior combination for HTK Tandem modeling - Dan Elli

4 Training a tandem model Tandem modeling ue two feature-pace - NN etimate phone poterior (dicriminant) - GMM model ubword likelihood (ditribution) Training procedure - NN trained (backprop) on bae feature to forced-alignment phone target - GMM trained on modified NN output via EM to maximie ubword model likelihood - HTK backend know nothing of phone model Decoupled (good) but equential Training et? - can ue ame for both - learning different info - could ue different - for cro-tak robue Tandem modeling - Dan Elli

5 Tandem ytem reult It work very well: WER a a function of SNR for variou Aurora99 ytem WER / % (log cale) Average WER ratio to baeline: HTK GMM: % Hybrid: 84.6% Tandem: 64.5% Tandem + PC: 47.2% HTK GMM baeline Hybrid connectionit Tandem Tandem + PC clean SNR / db (averaged over 4 noie) Sytem-feature Avg. WER 2- db Baeline WER ratio HTK-mfcc 3.7% % -mfcc 9.3% 84.5% Tandem-mfcc 7.4% 64.5% Tandem-mg+plp 6.4% 47.2% Tandem modeling - Dan Elli

6 2 Inide Tandem ytem: What going on? Viualization of the net output one eight three (MFP_83A) Spectrogram freq / khz Clean dB SNR to Hall noie db Ceptral-moothed mel pectrum freq / mel chan Phone poterior etimate phone q ax uw ow ao ah ay ey eh ih iy w r n v q ax uw ow ao ah ay ey eh ih iy w r n v 3.5 th fz th fz Hidden layer linear output phone kcl k t q ax uw ow ao ah ay ey eh ih iy w r n v th fz kcl k t time / kcl k t q ax uw ow ao ah ay ey eh ih iy w r n v th fz kcl k t time / normalize away noie Tandem modeling - Dan Elli

7 pace magnification perform a nonlinear remapping of the feature pace Smoothed pectrum 2 Linear output for /r/, /iy/, /w/ Poterior for /r/, /iy/, /w/ time / - mall change acro critical boundarie reult in large output change Tandem modeling - Dan Elli

8 dcl dcl Relative contribution Approx relative impact on baeline WER ratio for different component: Combo over mg: +2% plp C Pre-nonlinearity over poterior: +2% + PCA orthog'n Gau mix model HTK decoder ah t Combo over plp: +2% mg C KLT over direct: +8% Combo-into-HTK over Combo-into-noway: +5% Combo over mfcc: +25% NN over HTK: +5% Tandem over HTK: +35% Tandem over hybrid: +25% Tandem combo over HTK mfcc baeline: +53% Tandem modeling - Dan Elli

9 dcl Omitting the GMM Tied poterior (Rottland & Rigoll, ICASSP): Conventional ASR (HTK) Speech feature Gau mix model HMM decoder mixture weight Per-mixture likelihood - EM training of GMM and HMM mixture weight ah t ub-phone tate Rottland & Rigoll (2) HMM decoder C ah t ub-phone tate Speech feature - only mixture weight trained by EM Sytem mixture weight Phone probabilitie WSJ WER Hybrid baeline 5.8% Tied poterior 9.4% Tandem modeling - Dan Elli

10 3 Dicuion Key limitation: tak-pecific - NN i not like feature (it part of the trained ytem) Aurora999 wa a matched condition tak - ame noie added in training and tet - Aurora2 ha mimatched condition - Tandem modeling work jut a well How to relax pecificity? - train on alternative tak? - ue articulatory target Tandem modeling - Dan Elli

11 Future development How to optimize NN for thi tructure? - integrated training...previou work - HMM tate a target? Undertanding the gain - better analyi of each piece contribution - trength of different modeling approache - effect of model/training et ize variation - tied poterior? Other peech corpora - need both NN and GMM ytem... - Switchboard i next goal Tandem modeling - Dan Elli

Tandem modeling investigations

Tandem modeling investigations Tandem modeling investigations Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 What makes Tandem successful? Can we make Tandem better? Does Tandem

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 5 Introducing Tandem modeling

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 Introducing Robust speech recognition

More information

Recognition & Organization of Speech & Audio

Recognition & Organization of Speech & Audio Recognition & Organization of Speech & Audio Dan Ellis http://labrosa.ee.columbia.edu/ Outline 1 2 3 Introducing Projects in speech, music & audio Summary overview - Dan Ellis 21-9-28-1 1 Sound organization

More information

Cognitive Modeling & Processing for Speech Recognition Ears and Beyond

Cognitive Modeling & Processing for Speech Recognition Ears and Beyond Introduction Cognitive Modeling & Proceing for Speech Recognition Ear and Beyond B.H. Juang & Woojay Jeon Georgia Intitute of Technology February, 5 ASR ytem till perform far wore than human litener under

More information

Lecture 9: Speech Recognition: Front Ends

Lecture 9: Speech Recognition: Front Ends EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition: Front Ends 1 2 Recognizing Speech Feature Calculation Dan Ellis http://www.ee.columbia.edu/~dpwe/e682/

More information

Sound, Mixtures, and Learning

Sound, Mixtures, and Learning Sound, Mixtures, and Learning Dan Ellis Laboratory for Recognition and Organization of Speech and Audio (LabROSA) Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

Using Speech Models for Separation

Using Speech Models for Separation Using Speech Models for Separation Dan Ellis Comprising the work of Michael Mandel and Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University Outline 1 2 3 4 5 Sound organization Background & related work Existing projects

More information

Sound, Mixtures, and Learning

Sound, Mixtures, and Learning Sound, Mixtures, and Learning Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

Using Source Models in Speech Separation

Using Source Models in Speech Separation Using Source Models in Speech Separation Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

Acoustic Signal Processing Based on Deep Neural Networks

Acoustic Signal Processing Based on Deep Neural Networks Acoustic Signal Processing Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu, Yanhui Tu, Qing Wang, Tian Gao, Jun Du, LiRong Dai Outline

More information

EXPLORING COGNITIVE STRATEGIES FOR INTEGRATING MULTIPLE-VIEW VISUALIZATIONS

EXPLORING COGNITIVE STRATEGIES FOR INTEGRATING MULTIPLE-VIEW VISUALIZATIONS EXPLORING COGNITIVE STRATEGIES FOR INTEGRATING MULTIPLE-VIEW VISUALIZATIONS Young Sam Ryu 1, Beth Yot 2, Gregorio Convertino 2, Jian Chen 2, and Chri North 2 Grado Department of Indutrial and Sytem Engineering

More information

Noise-Robust Speech Recognition Technologies in Mobile Environments

Noise-Robust Speech Recognition Technologies in Mobile Environments Noise-Robust Speech Recognition echnologies in Mobile Environments Mobile environments are highly influenced by ambient noise, which may cause a significant deterioration of speech recognition performance.

More information

Using the Soundtrack to Classify Videos

Using the Soundtrack to Classify Videos Using the Soundtrack to Classify Videos Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

Automatic audio analysis for content description & indexing

Automatic audio analysis for content description & indexing Automatic audio analysis for content description & indexing Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 Auditory Scene Analysis (ASA) Computational

More information

Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04

Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04 Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04 Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 SPRACH overview The SPRACH Broadcast

More information

Joint Dictionary Learning-based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery

Joint Dictionary Learning-based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery > RPLAC HS LN WH YOUR PAPR DNFCAON NUMBR (DOUBL-CLCK HR O D) < oint Dictionary Learning-baed Non-Negative Matrix Factorization for Voice Converion to mprove Speech ntelligibility After Oral Surgery Szu-Wei

More information

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS Mitchel Weintraub and Leonardo Neumeyer SRI International Speech Research and Technology Program Menlo Park, CA, 94025 USA ABSTRACT

More information

SPEECH recordings taken from realistic environments typically

SPEECH recordings taken from realistic environments typically IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 Coupled Dictionaries for Exemplar-based Speech Enhancement and Automatic Speech Recognition Deepak Baby, Student Member, IEEE, Tuomas Virtanen,

More information

A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition

A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition LászlóTóth and Tamás Grósz MTA-SZTE Research Group on Artificial Intelligence Hungarian Academy of Sciences

More information

Visualization tools & demos and the ICSI Realization group

Visualization tools & demos and the ICSI Realization group Visualization tools & demos and the ICSI Realization group Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 Visualization: Goals and philosophy Tour

More information

Speech Enhancement Based on Deep Neural Networks

Speech Enhancement Based on Deep Neural Networks Speech Enhancement Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu and Jun Du at USTC 1 Outline and Talk Agenda In Signal Processing Letter,

More information

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST 2010 1379 Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments Hynek Bořil,

More information

Robustness, Separation & Pitch

Robustness, Separation & Pitch Robustness, Separation & Pitch or Morgan, Me & Pitch Dan Ellis Columbia / ICSI dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. Robustness and Separation 2. An Academic Journey 3. Future COLUMBIA

More information

Speech Enhancement Using Deep Neural Network

Speech Enhancement Using Deep Neural Network Speech Enhancement Using Deep Neural Network Pallavi D. Bhamre 1, Hemangi H. Kulkarni 2 1 Post-graduate Student, Department of Electronics and Telecommunication, R. H. Sapat College of Engineering, Management

More information

General Soundtrack Analysis

General Soundtrack Analysis General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

Superior pre-attentive auditory processing in musicians

Superior pre-attentive auditory processing in musicians Cognitive Neurocience 10, 1309±1313 (1999) THE preent tudy focue on in uence of long-term experience on auditory proceing, providing the rt evidence for pre-attentively uperior auditory proceing in muician.

More information

Modulation and Top-Down Processing in Audition

Modulation and Top-Down Processing in Audition Modulation and Top-Down Processing in Audition Malcolm Slaney 1,2 and Greg Sell 2 1 Yahoo! Research 2 Stanford CCRMA Outline The Non-Linear Cochlea Correlogram Pitch Modulation and Demodulation Information

More information

Gender Based Emotion Recognition using Speech Signals: A Review

Gender Based Emotion Recognition using Speech Signals: A Review 50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department

More information

LabROSA Research Overview

LabROSA Research Overview LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1.

More information

Protein Structure Prediction using 2D HP Lattice Model Based on Integer Programming Approach

Protein Structure Prediction using 2D HP Lattice Model Based on Integer Programming Approach 212 International Congre on Informatic, Environment, Energy and Application-IEEA 212 IPCSIT vol.38 (212) (212) IACSIT Pre, Singapore Protein Structure Prediction uing 2D HP Lattice Model Baed on Integer

More information

Speech recognition in noisy environments: A survey

Speech recognition in noisy environments: A survey T-61.182 Robustness in Language and Speech Processing Speech recognition in noisy environments: A survey Yifan Gong presented by Tapani Raiko Feb 20, 2003 About the Paper Article published in Speech Communication

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

Robust Speech Detection for Noisy Environments

Robust Speech Detection for Noisy Environments Robust Speech Detection for Noisy Environments Óscar Varela, Rubén San-Segundo and Luis A. Hernández ABSTRACT This paper presents a robust voice activity detector (VAD) based on hidden Markov models (HMM)

More information

A Novel Pulse Compression Scheme Based on Minimum Mean-Square Error Reiteration 1

A Novel Pulse Compression Scheme Based on Minimum Mean-Square Error Reiteration 1 A Novel Pule Compreion Scheme Baed on Minimum Mean-Square Error Reiteration Shannon D. Blunt Karl Gerlach Radar Diviion, Naval Reearch Laboratory 4555 Overlook Ave. S.W. Wahington DC 375 Abtract Thi paper

More information

Robust Neural Encoding of Speech in Human Auditory Cortex

Robust Neural Encoding of Speech in Human Auditory Cortex Robust Neural Encoding of Speech in Human Auditory Cortex Nai Ding, Jonathan Z. Simon Electrical Engineering / Biology University of Maryland, College Park Auditory Processing in Natural Scenes How is

More information

HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION

HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION 2 th May 216. Vol.87. No.2 25-216 JATIT & LLS. All rights reserved. HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION 1 IBRAHIM MISSAOUI, 2 ZIED LACHIRI National Engineering

More information

Predicting Peptides That Bind to MHC Molecules Using Supervised Learning of Hidden Markov Models

Predicting Peptides That Bind to MHC Molecules Using Supervised Learning of Hidden Markov Models PROTEINS: Structure, Function, and Genetic 33:460 474 (1998) RESEARCH ARTICLES Predicting Peptide That Bind to MHC Molecule Uing Supervied Learning of Hidden Markov Model Hirohi Mamituka* C&C Media Reearch

More information

REVIEW for Exam 2. Chapters 9 13 (& chi-square in ch8)

REVIEW for Exam 2. Chapters 9 13 (& chi-square in ch8) REVIEW for Exam Chapter 9 3 & chi-quare in ch8 True or Fale. Etimated tandard error of the mean in a paired-ample t-tet i baed on the variance of the difference core.. W/in S deign i particularly ueful

More information

Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss

Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Audio-Visual Speech Recognition Using Bimodal-Trained Features for a Person with Severe Hearing Loss Yuki Takashima 1, Ryo Aihara 1, Tetsuya Takiguchi

More information

Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop

Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop Lin-Ching Chang Department of Electrical Engineering and Computer Science School of Engineering Work Experience 09/12-present,

More information

arxiv: v1 [cs.cv] 13 Mar 2018

arxiv: v1 [cs.cv] 13 Mar 2018 RESOURCE AWARE DESIGN OF A DEEP CONVOLUTIONAL-RECURRENT NEURAL NETWORK FOR SPEECH RECOGNITION THROUGH AUDIO-VISUAL SENSOR FUSION Matthijs Van keirsbilck Bert Moons Marian Verhelst MICAS, Department of

More information

A bio-inspired feature extraction for robust speech recognition

A bio-inspired feature extraction for robust speech recognition Zouhir and Ouni SpringerPlus 24, 3:65 a SpringerOpen Journal RESEARCH A bio-inspired feature extraction for robust speech recognition Youssef Zouhir * and Kaïs Ouni Open Access Abstract In this paper,

More information

Appendix H GRADE profiles and results for peripheral neuropathic pain

Appendix H GRADE profiles and results for peripheral neuropathic pain Appendix H GRADE profile and reult for peripheral neuropathic pain Outcome Profile ID Follow-up (day) Critical Patient-reported global improvement 1 (at leat moderate improvement) Sleep interference normalied

More information

Noisy training for deep neural networks in speech recognition

Noisy training for deep neural networks in speech recognition Yin et al. EURASIP Journal on Audio, Speech, and Music Processing (15) 15:2 DOI.1186/s13636-14-47- RESEARCH Open Access Noisy training for deep neural networks in speech recognition Shi Yin 1,4,ChaoLiu

More information

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) Virendra Chauhan 1, Shobhana Dwivedi 2, Pooja Karale 3, Prof. S.M. Potdar 4 1,2,3B.E. Student 4 Assitant Professor 1,2,3,4Department of Electronics

More information

Seismic Response Control of Structures using Liquid Column Vibration Absorber Considering Real Earthquake Ground Motions

Seismic Response Control of Structures using Liquid Column Vibration Absorber Considering Real Earthquake Ground Motions Seimic Repone Control of Structure uing iquid Column Vibration Aborber Conidering Real Ground Motion Debai Panda M. Tech Scholar National Intitute of Technology Agartala Agartala, India Dr. Rama Debbarma

More information

Noise Maps for Quantitative and Clinical Severity Towards Long-Term ECG Monitoring

Noise Maps for Quantitative and Clinical Severity Towards Long-Term ECG Monitoring enor Article Noie Map for Quantitative and Clinical Severity Toward Long-Term ECG Monitoring Etrella Ever-Villalba ID, Francico Manuel Melgarejo-Meeguer, Manuel Blanco-Velaco ID, Francico Javier Gimeno-Blane

More information

Auditory-based Automatic Speech Recognition

Auditory-based Automatic Speech Recognition Auditory-based Automatic Speech Recognition Werner Hemmert, Marcus Holmberg Infineon Technologies, Corporate Research Munich, Germany {Werner.Hemmert,Marcus.Holmberg}@infineon.com David Gelbart International

More information

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise 4 Special Issue Speech-Based Interfaces in Vehicles Research Report Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise Hiroyuki Hoshino Abstract This

More information

Computational Auditory Scene Analysis: Principles, Practice and Applications

Computational Auditory Scene Analysis: Principles, Practice and Applications Computational Auditory Scene Analysis: Principles, Practice and Applications Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 Auditory Scene Analysis

More information

COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION

COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION Journal of Engineering Science and Technology Vol. 11, No. 9 (2016) 1221-1233 School of Engineering, Taylor s University COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION

More information

Three-Dimensional Reconstruction of Live Fish through the Moiré Technique

Three-Dimensional Reconstruction of Live Fish through the Moiré Technique Journal of Agricultural Science and Technology B 8 (2018) 242-246 doi: 10.17265/2161-6264/2018.04.006 D DAVID PUBLISHING Three-Dimenional Recontruction of Live Fih through the Moiré Technique Ariovaldo

More information

arxiv: v1 [q-bio.nc] 30 Jan 2018

arxiv: v1 [q-bio.nc] 30 Jan 2018 Over-repreentation of Extreme Event in Deciion-Making: A Rational Metacognitive Account Ardavan S. Nobandegani,3, Kevin da Silva Catanheira 3, A. Ro Otto 3, & Thoma R. Shultz 2,3 {ardavan.alehinobandegani,

More information

Enhanced Feature Extraction for Speech Detection in Media Audio

Enhanced Feature Extraction for Speech Detection in Media Audio INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Enhanced Feature Extraction for Speech Detection in Media Audio Inseon Jang 1, ChungHyun Ahn 1, Jeongil Seo 1, Younseon Jang 2 1 Media Research Division,

More information

Adaptation of Classification Model for Improving Speech Intelligibility in Noise

Adaptation of Classification Model for Improving Speech Intelligibility in Noise 1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise) (Regular Paper) 23 4, 2018 7 (JBE Vol. 23, No. 4, July 2018) https://doi.org/10.5909/jbe.2018.23.4.511

More information

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2, Giambattista Parascandolo 2, Stefano Squartini 1, Tuomas

More information

Speech and Sound Use in a Remote Monitoring System for Health Care

Speech and Sound Use in a Remote Monitoring System for Health Care Speech and Sound Use in a Remote System for Health Care M. Vacher J.-F. Serignat S. Chaillol D. Istrate V. Popescu CLIPS-IMAG, Team GEOD Joseph Fourier University of Grenoble - CNRS (France) Text, Speech

More information

Cortical representations of confidence in a visual perceptual decision

Cortical representations of confidence in a visual perceptual decision Received Feb Accepted Apr Publihed 5 Jun DOI:.8/ncomm9 Cortical repreentation of confidence in a viual perceptual deciion Leopold Zizlperger,, *, Thoma Sauvigny, *, Barbara Händel & Thoma Haarmeier,5 To

More information

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation Martin Graciarena 1, Luciana Ferrer 2, Vikramjit Mitra 1 1 SRI International,

More information

Evaluating the effectiveness of rating instruments for a communication skills assessment of medical residents

Evaluating the effectiveness of rating instruments for a communication skills assessment of medical residents Adv in Health Sci Educ (2009) 14:575 594 DOI 10.1007/10459-008-9142-2 ORIGINAL PAPER Evaluating the effectivene of rating intrument for a communication kill aement of medical reident Cherdak Iramaneerat

More information

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions 3 Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions Ryoichi Takashima, Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University,

More information

Evaluation of a Program to Enhance Young Drivers Safety in Israel

Evaluation of a Program to Enhance Young Drivers Safety in Israel Evaluation of a Program to Enhance Young Driver Safety in Irael Tomer Toledo* Technion Irael Intitute of Technology, Haifa, Irael Tippy Lotan Or Yarok, Hod Haharon, Irael Orit Taubman - Ben-Ari Bar-Ilan

More information

Intro to Scientific Analysis (BIO 100) THE t-test. Plant Height (m)

Intro to Scientific Analysis (BIO 100) THE t-test. Plant Height (m) THE t-test Let Start With a Example Whe coductig experimet, we would like to kow whether a experimetal treatmet had a effect o ome variable. A a imple but itructive example, uppoe we wat to kow whether

More information

Contour Integration in Anisometropic Amblyopia

Contour Integration in Anisometropic Amblyopia Pergamon PII: 0042-6989(97)00233-2 Viion Re., Vol. 38, No. 6, pp. 889-894, 1998 1998 Elevier cience Ltd. All right reerved Printed in Great Britain 0042-6989/98 $19.00 + 0.00 Contour Integration in Aniometropic

More information

Heart Murmur Recognition Based on Hidden Markov Model

Heart Murmur Recognition Based on Hidden Markov Model Journal of Signal and Information Processing, 2013, 4, 140-144 http://dx.doi.org/10.4236/jsip.2013.42020 Published Online May 2013 (http://www.scirp.org/journal/jsip) Heart Murmur Recognition Based on

More information

The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments

The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments Ram H. Woo, Alex Park, and Timothy J. Hazen MIT Computer Science and Artificial Intelligence Laboratory 32

More information

Mitigating the Effects of Non-Stationary Unseen Noises on Language Recognition Performance

Mitigating the Effects of Non-Stationary Unseen Noises on Language Recognition Performance INTERSPEECH 2015 Mitigating the Effects of Non-Stationary Unseen Noises on Language Recognition Performance Luciana Ferrer 1, Mitchell McLaren 2, Aaron Lawson 2, Martin Graciarena 2 1 Departamento de Computación,

More information

Classifying Knee Pathologies using Instantaneous Screws of the Six Degrees-of-Freedom Knee Motion

Classifying Knee Pathologies using Instantaneous Screws of the Six Degrees-of-Freedom Knee Motion Proceeding of the 26 IEEE International Conference on Robotic and Automation Orlando, Florida - May 26 Claifying Knee Pathologie uing Intantaneou Screw of the Six Degree-of-Freedom Knee Motion Alon Wolf

More information

VIRTUAL ASSISTANT FOR DEAF AND DUMB

VIRTUAL ASSISTANT FOR DEAF AND DUMB VIRTUAL ASSISTANT FOR DEAF AND DUMB Sumanti Saini, Shivani Chaudhry, Sapna, 1,2,3 Final year student, CSE, IMSEC Ghaziabad, U.P., India ABSTRACT For the Deaf and Dumb community, the use of Information

More information

Automatic Speech Recognition with an Adaptation Model Motivated by Auditory Processing

Automatic Speech Recognition with an Adaptation Model Motivated by Auditory Processing Automatic Speech Recognition with an Adaptation Model Motivated by Auditory Processing Marcus Holmberg, David Gelbart, Student Member, IEEE, and Werner Hemmert, Member, IEEE Abstract The Mel-Frequency

More information

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION Lu Xugang Li Gang Wang Lip0 Nanyang Technological University, School of EEE, Workstation Resource

More information

Oregon Graduate Institute of Science and Technology,

Oregon Graduate Institute of Science and Technology, SPEAKER RECOGNITION AT OREGON GRADUATE INSTITUTE June & 6, 997 Sarel van Vuuren and Narendranath Malayath Hynek Hermansky and Pieter Vermeulen, Oregon Graduate Institute, Portland, Oregon Oregon Graduate

More information

The Unifi-EV Protocol for Evalita 2009

The Unifi-EV Protocol for Evalita 2009 AI*IA 2009: International Conference on Artificial Intelligence EVALITA Workshop Reggio Emilia, 12 Dicembre 2009 The Unifi-EV2009-1 Protocol for Evalita 2009 Prof. Ing. Monica Carfagni, Ing. Matteo Nunziati

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception Long-term spectrum of speech HCS 7367 Speech Perception Connected speech Absolute threshold Males Dr. Peter Assmann Fall 212 Females Long-term spectrum of speech Vowels Males Females 2) Absolute threshold

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Visual recognition of continuous Cued Speech using a tandem CNN-HMM approach

Visual recognition of continuous Cued Speech using a tandem CNN-HMM approach Interspeech 2018 2-6 September 2018, Hyderabad Visual recognition of continuous Cued Speech using a tandem CNN-HMM approach Li Liu, Thomas Hueber, Gang Feng and Denis Beautemps Univ. Grenoble Alpes, CNRS,

More information

CH: Fitness. Physical Fitness- the ability to carry out daily tasks easily and have energy to respond to unexpected demands. Benefits of Exercise..

CH: Fitness. Physical Fitness- the ability to carry out daily tasks easily and have energy to respond to unexpected demands. Benefits of Exercise.. Benefit of Exercie.. CH: Fitne Health ED Phyical Health Nervou Sytem Repiratory Sytem Cardiovacular Sytem Weight control Mental benefit Social benefit Phyical Benefit Emotional Benefit. Lower blood preure

More information

Research Article Automatic Speaker Recognition for Mobile Forensic Applications

Research Article Automatic Speaker Recognition for Mobile Forensic Applications Hindawi Mobile Information Systems Volume 07, Article ID 698639, 6 pages https://doi.org//07/698639 Research Article Automatic Speaker Recognition for Mobile Forensic Applications Mohammed Algabri, Hassan

More information

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,900 116,000 120M Open access books available International authors and editors Downloads Our

More information

THE TYCHE AND SAFE MODELS: COMPARING TWO MILITARY FORCE STRUCTURE ANALYSIS SIMULATIONS

THE TYCHE AND SAFE MODELS: COMPARING TWO MILITARY FORCE STRUCTURE ANALYSIS SIMULATIONS THE TYCHE AND SAFE MODELS: COMPARING TWO MILITARY FORCE STRUCTURE ANALYSIS SIMULATIONS Cheryl Eiler and Slawomir Weolkowki Daniel T. Wojtazek Centre for Operational Reearch and Analyi Atomic Energy of

More information

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar,

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar, International Journal of Pure and Applied Mathematics Volume 117 No. 8 2017, 121-125 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: 10.12732/ijpam.v117i8.25

More information

Time Varying Comb Filters to Reduce Spectral and Temporal Masking in Sensorineural Hearing Impairment

Time Varying Comb Filters to Reduce Spectral and Temporal Masking in Sensorineural Hearing Impairment Bio Vision 2001 Intl Conf. Biomed. Engg., Bangalore, India, 21-24 Dec. 2001, paper PRN6. Time Varying Comb Filters to Reduce pectral and Temporal Masking in ensorineural Hearing Impairment Dakshayani.

More information

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment Acoustics, signals & systems for audiology Psychoacoustics of hearing impairment Three main types of hearing impairment Conductive Sound is not properly transmitted from the outer to the inner ear Sensorineural

More information

Improving the Noise-Robustness of Mel-Frequency Cepstral Coefficients for Speech Processing

Improving the Noise-Robustness of Mel-Frequency Cepstral Coefficients for Speech Processing ISCA Archive http://www.isca-speech.org/archive ITRW on Statistical and Perceptual Audio Processing Pittsburgh, PA, USA September 16, 2006 Improving the Noise-Robustness of Mel-Frequency Cepstral Coefficients

More information

Sudden Noise Reduction Based on GMM with Noise Power Estimation

Sudden Noise Reduction Based on GMM with Noise Power Estimation J. Software Engineering & Applications, 010, 3: 341-346 doi:10.436/jsea.010.339 Pulished Online April 010 (http://www.scirp.org/journal/jsea) 341 Sudden Noise Reduction Based on GMM with Noise Power Estiation

More information

by Pace et al. (1985). These investigators developed

by Pace et al. (1985). These investigators developed JOURNAL OF APPLIED BEHAVIOR ANALYSIS 19921 259 491-498 NUMBEP. 2 (ummep. 1992) A COMPARISON OF TWO APPROACHES FOR IDENTIFYING REINFORCERS FOR PERSONS WITH SEVERE AND PROFOUND DISABILITIES WAYNE FiHm, CAmLEEN

More information

Analytical Model of Scanning Laser Polarimetry for Retinal Nerve Fiber Layer Assessment

Analytical Model of Scanning Laser Polarimetry for Retinal Nerve Fiber Layer Assessment Analytical Model of Scanning Laer Polarimetry for Retinal Nerve Fiber Layer Aement Robert W. Knighton, Xiang-Run Huang, and David S. Greenfield From the Bacom Palmer Eye Intitute, Univerity of Miami School

More information

Phonologically-based biomarkers for major depressive disorder

Phonologically-based biomarkers for major depressive disorder RESEARCH Open Access Phonologically-based biomarkers for major depressive disorder Andrea Carolina Trevino, Thomas Francis Quatieri * and Nicolas Malyska Abstract Of increasing importance in the civilian

More information

NONLINEAR REGRESSION I

NONLINEAR REGRESSION I EE613 Machine Learning for Engineers NONLINEAR REGRESSION I Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Dec. 13, 2017 1 Outline Properties of multivariate Gaussian distributions

More information

Auditory principles in speech processing do computers need silicon ears?

Auditory principles in speech processing do computers need silicon ears? * with contributions by V. Hohmann, M. Kleinschmidt, T. Brand, J. Nix, R. Beutelmann, and more members of our medical physics group Prof. Dr. rer.. nat. Dr. med. Birger Kollmeier* Auditory principles in

More information

This lecture. Acoustics. Speech production. Speech perception.

This lecture. Acoustics. Speech production. Speech perception. This lecture Acoustics. Speech production. Speech perception. Some images from Gray s Anatomy, Jim Glass course 6.345 (MIT), the Jurafsky & Martin textbook, Encyclopedia Britannica, the Rolling Stones,

More information

Data for MBI Workshop Statistics of Time Warpings and Phase Variations. Three-dimensional vascular geometry dataset

Data for MBI Workshop Statistics of Time Warpings and Phase Variations. Three-dimensional vascular geometry dataset Data for MBI Workhop Statitic of Time Warping and Phae Variation Mathematical Biocience Intitute, November 13-16, 2012 Three-dimenional vacular geometry dataet May 30, 2012 1 Data background Thee data

More information

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches CASA talk - Haskins/NUWC - Dan Ellis 1997oct24/5-1 The importance of auditory illusions for artificial listeners 1 Dan Ellis International Computer Science Institute, Berkeley CA

More information

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common sleep disorder

More information

Speaker Independent Isolated Word Speech to Text Conversion Using Auto Spectral Subtraction for Punjabi Language

Speaker Independent Isolated Word Speech to Text Conversion Using Auto Spectral Subtraction for Punjabi Language International Journal of Scientific and Research Publications, Volume 7, Issue 7, July 2017 469 Speaker Independent Isolated Word Speech to Text Conversion Using Auto Spectral Subtraction for Punjabi Language

More information

Computerized testing of cognitive functions

Computerized testing of cognitive functions Computerized teting of cognitive function PhDr. Jiri Kloe Head of the Central edical Pychology Department Central ilitary Hopital Prague Prague, Czech Republic Doc. PhDr. ilan Brichcin, CSc. Senior Reearch

More information

Fate of Le႖ion-Related k". ranche႖ After Coronary Artery S en

Fate of Le႖ion-Related k. ranche႖ After Coronary Artery S en JACC Vol. 22, No. 6 November 15. 1993: 1 64 1-6 1641 1 Fate of Le႖ion-Related k". ranche႖ After Coronary Artery S en DAVID L. FISCHMAN, MI), MICHAEL P. SAVAGE, M, aacc, MARTIN B. LEON, MD, u=acc,* RICHARD

More information