Oregon Graduate Institute of Science and Technology,

Size: px
Start display at page:

Download "Oregon Graduate Institute of Science and Technology,"

Transcription

1 SPEAKER RECOGNITION AT OREGON GRADUATE INSTITUTE June & 6, 997 Sarel van Vuuren and Narendranath Malayath Hynek Hermansky and Pieter Vermeulen, Oregon Graduate Institute, Portland, Oregon

2 Oregon Graduate Institute. Speaker Recognition at OGI Research Group Goals. Competitive System Architecture Results 3. Initial Robust System Architecture Preliminary Results and Conclusions Planned Extensions

3 People { Faculty: Hynek Hermansky, Pieter Vermeulen { Post Doc: Nobu Kanedera, Carlos Avendano { PhD Students: Sarel van Vuuren, Sangita Tibrewala, Narendranath Malayath Speech processing by emulating relevant properties of speech perception Collaboration with { CSLU at OGI { ICSI Berkeley { IIT Madras { IDIAP Martigny { KTH Stockholm

4 Activities { Speaker identication { Acoustic modeling for ASR { Enhancement of degraded speech and speech processing for handicapped { Human speech perception

5 Speaker Recognition at OGI Speech Signal { linguistic message { speaker characteristics { environment Task { nd out how these information sources are coded into the signal Applications { speaker ID { speaker independent ASR { voice mimic

6 Requirements of a Speaker Verication System Invariant to channel Invariant to session Invariant to noise Minimal training data Minimal verication data Adapt to speaker styles

7 Goals Be familiar with state of the art { Build an up to date competitive system following the state of the art { Analyze and understand abilities and limitations { Contribute to research system { Incorporate ideas from research system

8 Goals Research novel ideas { Knowledge driven { Analyze and understand { Report results { Contribute to state of the art system { Incorporate knowledge from state of the art system Address robustness { Invariance vs modeling { Channels and noise Address data requirements { Training { Verication

9 Initial Robust System Preprocessing Similar Representation Rep. Rep. L+E Speaker Specific Mapping L+S+E + L+S +E - Distance Information Sources L:Linguistic S:Speaker E:Environment Frame Integration Features Likelihood Estimator Residue invariant to extraneous information and noise Preprocessing: segmentation - such as silence removal, voiced segments Representation: diering speaker information - such as low order PLP vs high order PLP Speaker Specic Mapping: - such as Neural Net or Pseudo Inverse

10 Initial Robust System Preprocessing Similar Representation Rep. Rep. L+E Speaker Specific Mapping L+S+E + L+S +E - Distance Information Sources L:Linguistic S:Speaker E:Environment Frame Integration Features Likelihood Estimator Speaker Specic Distance Measure: - Euclidean, likelihood estimator, Bhattacharyya Frame Integrator: - average, voting Likelihood Estimator Adding other information (pitch, formants)

11 Initial Robust Implementation PLP Representation Remove Silence PLP-7 PLP-4 Speaker Specific NN + - Euclidean Frame Average Preprocessing: Silence deletion Representation: PLP-7 vs PLP-4 Speaker Specic Mapping: Neural Net Distance Measure: Euclidean Frame Integrator: Average Likelihood Estimator: None

12 Preliminary Studies Map from speaker independent to speaker rich representation Evidence of discrimination Evidence of low data requirements for verication No handset robustness - mapping not invariant due to training methodology

13 Results { GMM baseline DET curve: handset training; 3 sec test ; female; training handset 0 mdcf hdcf 0.00 eer 9.80 % mdcf (.8,9.4) hdcf (.7,4.) mdcf 0.08 eer 73 0 DET curve: handset training; 3 sec test ; female; non training handset 0 mdcf hdcf eer 4.86 % mdcf (.9,4.) hdcf (.7,4.) mdcf eer

14 Results { GMM baseline DET curve: handset training; 0 sec test; female; training handset 0 mdcf 0.09 hdcf eer.04 % mdcf (.3,.7) hdcf (.,7.4) mdcf eer 07 0 DET curve: handset training; 0 sec test; female; non training handset 0 mdcf hdcf 0.00 eer 9.60 % mdcf (.,3.8) hdcf (.,37.8) mdcf eer 0.3 0

15 Results { GMM baseline DET curve: handset training; 30 sec test; female; training handset 0 mdcf 0.0 hdcf 0.06 eer.80 % mdcf (0.6,8.7) hdcf (0.7,9.0) mdcf 0.0 eer 7 0 DET curve: handset training; 30 sec test; female; non training handset 0 mdcf hdcf eer 6.9 % mdcf (.,9.) hdcf (0.7,30.0) mdcf 0.09 eer 94 0

16 Results { PLP system DET curve: handset training; 3 sec test ; female; training handset 0 mdcf eer 9.0 % mdcf (.9,0.0) mdcf 0.84 eer DET curve: handset training; 3 sec test ; female; non training handset 0 mdcf eer 33.3 % mdcf (0.8,0.0) mdcf eer

17 Results { PLP system DET curve: handset training; 0 sec test; female; training handset 0 mdcf 0.06 eer.69 % mdcf (.4,.9) mdcf 0.88 eer DET curve: handset training; 0 sec test; female; non training handset 0 mdcf 0.09 eer % mdcf (0.9,0.0) mdcf 0.87 eer

18 Results { PLP system DET curve: handset training; 30 sec test; female; training handset 0 mdcf eer 4.8 % mdcf (.6,4.6) mdcf 0.88 eer DET curve: handset training; 30 sec test; female; non training handset 0 mdcf eer 9.4 % mdcf (0.8,0.0) mdcf 0.83 eer

19 Results { Subspace system DET curve: handset training; 3 sec test ; female; training handset 0 mdcf 0.07 eer.4 % mdcf (.,0.0) mdcf eer DET curve: handset training; 3 sec test ; female; non training handset eer 3.8 % 0 0

20 Results { Subspace system DET curve: handset training; 0 sec test; female; training handset 0 mdcf eer.76 % mdcf (.,39.) mdcf eer DET curve: handset training; 0 sec test; female; non training handset eer % 0 0

21 Results { Subspace system DET curve: handset training; 30 sec test; female; training handset 0 mdcf 0.0 eer 9.4 % mdcf (.6,38.7) mdcf eer DET curve: handset training; 30 sec test; female; non training handset eer 9.4 % 0 0

22 Future Work: Speaker Verication Understand each component Preprocessing Representation Environment Invariant Mapping Distance Measure

Research Article Automatic Speaker Recognition for Mobile Forensic Applications

Research Article Automatic Speaker Recognition for Mobile Forensic Applications Hindawi Mobile Information Systems Volume 07, Article ID 698639, 6 pages https://doi.org//07/698639 Research Article Automatic Speaker Recognition for Mobile Forensic Applications Mohammed Algabri, Hassan

More information

Robustness, Separation & Pitch

Robustness, Separation & Pitch Robustness, Separation & Pitch or Morgan, Me & Pitch Dan Ellis Columbia / ICSI dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. Robustness and Separation 2. An Academic Journey 3. Future COLUMBIA

More information

Recognition & Organization of Speech & Audio

Recognition & Organization of Speech & Audio Recognition & Organization of Speech & Audio Dan Ellis http://labrosa.ee.columbia.edu/ Outline 1 2 3 Introducing Projects in speech, music & audio Summary overview - Dan Ellis 21-9-28-1 1 Sound organization

More information

Robust Speech Detection for Noisy Environments

Robust Speech Detection for Noisy Environments Robust Speech Detection for Noisy Environments Óscar Varela, Rubén San-Segundo and Luis A. Hernández ABSTRACT This paper presents a robust voice activity detector (VAD) based on hidden Markov models (HMM)

More information

General Soundtrack Analysis

General Soundtrack Analysis General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

Speech recognition in noisy environments: A survey

Speech recognition in noisy environments: A survey T-61.182 Robustness in Language and Speech Processing Speech recognition in noisy environments: A survey Yifan Gong presented by Tapani Raiko Feb 20, 2003 About the Paper Article published in Speech Communication

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 5 Introducing Tandem modeling

More information

The Unifi-EV Protocol for Evalita 2009

The Unifi-EV Protocol for Evalita 2009 AI*IA 2009: International Conference on Artificial Intelligence EVALITA Workshop Reggio Emilia, 12 Dicembre 2009 The Unifi-EV2009-1 Protocol for Evalita 2009 Prof. Ing. Monica Carfagni, Ing. Matteo Nunziati

More information

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan Emotion

More information

Acoustic Signal Processing Based on Deep Neural Networks

Acoustic Signal Processing Based on Deep Neural Networks Acoustic Signal Processing Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu, Yanhui Tu, Qing Wang, Tian Gao, Jun Du, LiRong Dai Outline

More information

Speech as HCI. HCI Lecture 11. Human Communication by Speech. Speech as HCI(cont. 2) Guest lecture: Speech Interfaces

Speech as HCI. HCI Lecture 11. Human Communication by Speech. Speech as HCI(cont. 2) Guest lecture: Speech Interfaces HCI Lecture 11 Guest lecture: Speech Interfaces Hiroshi Shimodaira Institute for Communicating and Collaborative Systems (ICCS) Centre for Speech Technology Research (CSTR) http://www.cstr.ed.ac.uk Thanks

More information

Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04

Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04 Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04 Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 SPRACH overview The SPRACH Broadcast

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception Babies 'cry in mother's tongue' HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Babies' cries imitate their mother tongue as early as three days old German researchers say babies begin to pick up

More information

PMR5406 Redes Neurais e Lógica Fuzzy. Aula 5 Alguns Exemplos

PMR5406 Redes Neurais e Lógica Fuzzy. Aula 5 Alguns Exemplos PMR5406 Redes Neurais e Lógica Fuzzy Aula 5 Alguns Exemplos APPLICATIONS Two examples of real life applications of neural networks for pattern classification: RBF networks for face recognition FF networks

More information

COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA

COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION M. Grimm 1, E. Mower 2, K. Kroschel 1, and S. Narayanan 2 1 Institut für Nachrichtentechnik (INT), Universität Karlsruhe (TH), Karlsruhe,

More information

Auditory gist perception and attention

Auditory gist perception and attention Auditory gist perception and attention Sue Harding Speech and Hearing Research Group University of Sheffield POP Perception On Purpose Since the Sheffield POP meeting: Paper: Auditory gist perception:

More information

Enhancement of Reverberant Speech Using LP Residual Signal

Enhancement of Reverberant Speech Using LP Residual Signal IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 3, MAY 2000 267 Enhancement of Reverberant Speech Using LP Residual Signal B. Yegnanarayana, Senior Member, IEEE, and P. Satyanarayana Murthy

More information

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants Context Effect on Segmental and Supresegmental Cues Preceding context has been found to affect phoneme recognition Stop consonant recognition (Mann, 1980) A continuum from /da/ to /ga/ was preceded by

More information

A New Paradigm for the Evaluation of Forensic Evidence

A New Paradigm for the Evaluation of Forensic Evidence A New Paradigm for the Evaluation of Forensic Evidence and its implementation in forensic voice comparison Geoffrey Stewart Morrison Ewald Enzinger p(e H ) p p(e H ) d Abstract & Biographies In Europe

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University Outline 1 2 3 4 5 Sound organization Background & related work Existing projects

More information

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1 1. INTRODUCTION Sign language interpretation is one of the HCI applications where hand gesture plays important role for communication. This chapter discusses sign language interpretation system with present

More information

Gender Based Emotion Recognition using Speech Signals: A Review

Gender Based Emotion Recognition using Speech Signals: A Review 50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 BACKGROUND Speech is the most natural form of human communication. Speech has also become an important means of human-machine interaction and the advancement in technology has

More information

The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments

The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments Ram H. Woo, Alex Park, and Timothy J. Hazen MIT Computer Science and Artificial Intelligence Laboratory 32

More information

Sound, Mixtures, and Learning

Sound, Mixtures, and Learning Sound, Mixtures, and Learning Dan Ellis Laboratory for Recognition and Organization of Speech and Audio (LabROSA) Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions 3 Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions Ryoichi Takashima, Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University,

More information

Using Speech Models for Separation

Using Speech Models for Separation Using Speech Models for Separation Dan Ellis Comprising the work of Michael Mandel and Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY

More information

Lecture 9: Speech Recognition: Front Ends

Lecture 9: Speech Recognition: Front Ends EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition: Front Ends 1 2 Recognizing Speech Feature Calculation Dan Ellis http://www.ee.columbia.edu/~dpwe/e682/

More information

Codebook driven short-term predictor parameter estimation for speech enhancement

Codebook driven short-term predictor parameter estimation for speech enhancement Codebook driven short-term predictor parameter estimation for speech enhancement Citation for published version (APA): Srinivasan, S., Samuelsson, J., & Kleijn, W. B. (2006). Codebook driven short-term

More information

TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY

TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY ARCHIVES OF ACOUSTICS 32, 4 (Supplement), 187 192 (2007) TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY Piotr STARONIEWICZ Wrocław University of Technology Institute of Telecommunications,

More information

Modulation and Top-Down Processing in Audition

Modulation and Top-Down Processing in Audition Modulation and Top-Down Processing in Audition Malcolm Slaney 1,2 and Greg Sell 2 1 Yahoo! Research 2 Stanford CCRMA Outline The Non-Linear Cochlea Correlogram Pitch Modulation and Demodulation Information

More information

Tandem modeling investigations

Tandem modeling investigations Tandem modeling investigations Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 What makes Tandem successful? Can we make Tandem better? Does Tandem

More information

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation Martin Graciarena 1, Luciana Ferrer 2, Vikramjit Mitra 1 1 SRI International,

More information

International Forensic Science & Forensic Medicine Conference Naif Arab University for Security Sciences Riyadh Saudi Arabia

International Forensic Science & Forensic Medicine Conference Naif Arab University for Security Sciences Riyadh Saudi Arabia SPECTRAL EDITING IN SPEECH RECORDINGS: CHALLENGES IN AUTHENTICITY CHECKING Antonio César Morant Braid Electronic Engineer, Specialist Official Forensic Expert, Public Security Secretariat - Technical Police

More information

Lecturer: T. J. Hazen. Handling variability in acoustic conditions. Computing and applying confidence scores

Lecturer: T. J. Hazen. Handling variability in acoustic conditions. Computing and applying confidence scores Lecture # 20 Session 2003 Noise Robustness and Confidence Scoring Lecturer: T. J. Hazen Handling variability in acoustic conditions Channel compensation Background noise compensation Foreground noises

More information

Audiovisual to Sign Language Translator

Audiovisual to Sign Language Translator Technical Disclosure Commons Defensive Publications Series July 17, 2018 Audiovisual to Sign Language Translator Manikandan Gopalakrishnan Follow this and additional works at: https://www.tdcommons.org/dpubs_series

More information

Errol Davis Director of Research and Development Sound Linked Data Inc. Erik Arisholm Lead Engineer Sound Linked Data Inc.

Errol Davis Director of Research and Development Sound Linked Data Inc. Erik Arisholm Lead Engineer Sound Linked Data Inc. An Advanced Pseudo-Random Data Generator that improves data representations and reduces errors in pattern recognition in a Numeric Knowledge Modeling System Errol Davis Director of Research and Development

More information

Acoustic-Labial Speaker Verication. (luettin, genoud,

Acoustic-Labial Speaker Verication. (luettin, genoud, Acoustic-Labial Speaker Verication Pierre Jourlin 1;2, Juergen Luettin 1, Dominique Genoud 1, Hubert Wassner 1 1 IDIAP, rue du Simplon 4, CP 592, CH-192 Martigny, Switzerland (luettin, genoud, wassner)@idiap.ch

More information

Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice

Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice International Conference on Computer Systems and Technologies - CompSysTech 17 Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice Alexander I. Iliev, Peter Stanchev Abstract:

More information

Springer. Springer Handbook of Auditory Research. Series Editors: Richard R. Fay and Arthur N. Popper

Springer. Springer Handbook of Auditory Research. Series Editors: Richard R. Fay and Arthur N. Popper Springer Handbook of Auditory Research Series Editors: Richard R. Fay and Arthur N. Popper Springer New York Berlin Heidelberg Hong Kong London Milan Paris Tokyo Steven Greenberg Arthur N. Popper Editors

More information

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment Acoustics, signals & systems for audiology Psychoacoustics of hearing impairment Three main types of hearing impairment Conductive Sound is not properly transmitted from the outer to the inner ear Sensorineural

More information

PATTERN ELEMENT HEARING AIDS AND SPEECH ASSESSMENT AND TRAINING Adrian FOURCIN and Evelyn ABBERTON

PATTERN ELEMENT HEARING AIDS AND SPEECH ASSESSMENT AND TRAINING Adrian FOURCIN and Evelyn ABBERTON PATTERN ELEMENT HEARING AIDS AND SPEECH ASSESSMENT AND TRAINING Adrian FOURCIN and Evelyn ABBERTON Summary This paper has been prepared for a meeting (in Beijing 9-10 IX 1996) organised jointly by the

More information

Kathy Nico Carbonell Speech, Language & Hearing Sciences, University of Florida P.O. Box University of Florida Gainesville, FL 32610

Kathy Nico Carbonell Speech, Language & Hearing Sciences, University of Florida P.O. Box University of Florida Gainesville, FL 32610 Kathy Nico Carbonell P.O. Box 100174 University of Florida Gainesville, FL 32610 Phone: (352)294-8253 email: kathycarbonell@phhp.ufl.edu RESEARCH INTERESTS Speech intelligibility, perceptual flexibility,

More information

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches CASA talk - Haskins/NUWC - Dan Ellis 1997oct24/5-1 The importance of auditory illusions for artificial listeners 1 Dan Ellis International Computer Science Institute, Berkeley CA

More information

Hearing Impaired K 12

Hearing Impaired K 12 Hearing Impaired K 12 Section 20 1 Knowledge of philosophical, historical, and legal foundations and their impact on the education of students who are deaf or hard of hearing 1. Identify federal and Florida

More information

GfK Verein. Detecting Emotions from Voice

GfK Verein. Detecting Emotions from Voice GfK Verein Detecting Emotions from Voice Respondents willingness to complete questionnaires declines But it doesn t necessarily mean that consumers have nothing to say about products or brands: GfK Verein

More information

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening AUDL GS08/GAV1 Signals, systems, acoustics and the ear Pitch & Binaural listening Review 25 20 15 10 5 0-5 100 1000 10000 25 20 15 10 5 0-5 100 1000 10000 Part I: Auditory frequency selectivity Tuning

More information

Jing Shen CURRICULUM VITAE Contact Information Education Academic and Research Experience

Jing Shen CURRICULUM VITAE Contact Information Education Academic and Research Experience Jing Shen CURRICULUM VITAE Contact Information Department of Communication Sciences and Disorders Northwestern University 2240 Campus Drive, Evanston, Il 60208 Email: jing.shen@northwestern.edu Phone:

More information

Robust Neural Encoding of Speech in Human Auditory Cortex

Robust Neural Encoding of Speech in Human Auditory Cortex Robust Neural Encoding of Speech in Human Auditory Cortex Nai Ding, Jonathan Z. Simon Electrical Engineering / Biology University of Maryland, College Park Auditory Processing in Natural Scenes How is

More information

Auditory scene analysis in humans: Implications for computational implementations.

Auditory scene analysis in humans: Implications for computational implementations. Auditory scene analysis in humans: Implications for computational implementations. Albert S. Bregman McGill University Introduction. The scene analysis problem. Two dimensions of grouping. Recognition

More information

SVM-based Discriminative Accumulation Scheme for Place Recognition

SVM-based Discriminative Accumulation Scheme for Place Recognition SVM-based Discriminative Accumulation Scheme for Place Recognition Andrzej Pronobis CAS/CVAP, KTH Stockholm, Sweden pronobis@csc.kth.se Óscar Martínez Mozos AIS, University Of Freiburg Freiburg, Germany

More information

A Lip Reading Application on MS Kinect Camera

A Lip Reading Application on MS Kinect Camera A Lip Reading Application on MS Kinect Camera Alper Yargıç, Muzaffer Doğan Computer Engineering Department Anadolu University Eskisehir, Turkey {ayargic,muzafferd}@anadolu.edu.tr Abstract Hearing-impaired

More information

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST 2010 1379 Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments Hynek Bořil,

More information

Movement and Memory. Undergraduate degree. Technology & Movement: New Approaches to Understanding Change. Graduate degree

Movement and Memory. Undergraduate degree. Technology & Movement: New Approaches to Understanding Change. Graduate degree Tracy Zitzelberger, MPH Technology & Movement: New Approaches to Understanding Change Tracy Zitzelberger, MPH Administra9ve Director Layton Aging & Alzheimer's Disease Center Oregon Center for Aging &

More information

SPEECH PERCEPTION IN A 3-D WORLD

SPEECH PERCEPTION IN A 3-D WORLD SPEECH PERCEPTION IN A 3-D WORLD A line on an audiogram is far from answering the question How well can this child hear speech? In this section a variety of ways will be presented to further the teacher/therapist

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception Long-term spectrum of speech HCS 7367 Speech Perception Connected speech Absolute threshold Males Dr. Peter Assmann Fall 212 Females Long-term spectrum of speech Vowels Males Females 2) Absolute threshold

More information

Computational Perception /785. Auditory Scene Analysis

Computational Perception /785. Auditory Scene Analysis Computational Perception 15-485/785 Auditory Scene Analysis A framework for auditory scene analysis Auditory scene analysis involves low and high level cues Low level acoustic cues are often result in

More information

Evaluation of the neurological state of people with Parkinson s disease using i-vectors

Evaluation of the neurological state of people with Parkinson s disease using i-vectors INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Evaluation of the neurological state of people with Parkinson s disease using s N. Garcia 1, J. R. Orozco-Arroyave 1,2, L. F. D Haro 3, Najim Dehak

More information

Sound Localization PSY 310 Greg Francis. Lecture 31. Audition

Sound Localization PSY 310 Greg Francis. Lecture 31. Audition Sound Localization PSY 310 Greg Francis Lecture 31 Physics and psychology. Audition We now have some idea of how sound properties are recorded by the auditory system So, we know what kind of information

More information

SRIRAM GANAPATHY. Indian Institute of Science, Phone: +91-(80) Bangalore, India, Fax: +91-(80)

SRIRAM GANAPATHY. Indian Institute of Science, Phone: +91-(80) Bangalore, India, Fax: +91-(80) SRIRAM GANAPATHY Assistant Professor, Email: sriram@ee.iisc.ernet.in Electrical Engineering, Web: http://www.leap.ee.iisc.ac.in/sriram Indian Institute of Science, Phone: +91-(80)-2293-2433 Bangalore,

More information

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION Lu Xugang Li Gang Wang Lip0 Nanyang Technological University, School of EEE, Workstation Resource

More information

LabROSA Research Overview

LabROSA Research Overview LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1.

More information

Motion Control for Social Behaviours

Motion Control for Social Behaviours Motion Control for Social Behaviours Aryel Beck a.beck@ntu.edu.sg Supervisor: Nadia Magnenat-Thalmann Collaborators: Zhang Zhijun, Rubha Shri Narayanan, Neetha Das 10-03-2015 INTRODUCTION In order for

More information

Speech, Language, and Hearing Sciences. Discovery with delivery as WE BUILD OUR FUTURE

Speech, Language, and Hearing Sciences. Discovery with delivery as WE BUILD OUR FUTURE Speech, Language, and Hearing Sciences Discovery with delivery as WE BUILD OUR FUTURE It began with Dr. Mack Steer.. SLHS celebrates 75 years at Purdue since its beginning in the basement of University

More information

Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone

Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone Acoustics 8 Paris Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone V. Zimpfer a and K. Buck b a ISL, 5 rue du Général Cassagnou BP 734, 6831 Saint Louis, France b

More information

Auditory Scene Analysis

Auditory Scene Analysis 1 Auditory Scene Analysis Albert S. Bregman Department of Psychology McGill University 1205 Docteur Penfield Avenue Montreal, QC Canada H3A 1B1 E-mail: bregman@hebb.psych.mcgill.ca To appear in N.J. Smelzer

More information

Human-Robotic Agent Speech Interaction

Human-Robotic Agent Speech Interaction Human-Robotic Agent Speech Interaction INSIDE Technical Report 002-16 Rubén Solera-Ureña and Helena Moniz February 23, 2016 1. Introduction Project INSIDE investigates how robust symbiotic interactions

More information

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception ISCA Archive VOQUAL'03, Geneva, August 27-29, 2003 Jitter, Shimmer, and Noise in Pathological Voice Quality Perception Jody Kreiman and Bruce R. Gerratt Division of Head and Neck Surgery, School of Medicine

More information

I>. U8.!E+S (contextual tuning theory) '/ # +8IL

I>. U8.!E+S (contextual tuning theory) '/ # +8IL 'IH3 $ +K. t.i. (..L { @ >*S 1 K 1 /O)*0@ 2 1 619-02, 3 i S ef 2-2, ATR s (magnuson@hip.atr.co.jp, yamada@hip.atr.co.jp) 2 Q Q (h-nusbaum@uchicago.edu) I>. U8.!E+S (contextual tuning theory) '/ # +8IL

More information

Sensory Cue Integration

Sensory Cue Integration Sensory Cue Integration Summary by Byoung-Hee Kim Computer Science and Engineering (CSE) http://bi.snu.ac.kr/ Presentation Guideline Quiz on the gist of the chapter (5 min) Presenters: prepare one main

More information

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) Virendra Chauhan 1, Shobhana Dwivedi 2, Pooja Karale 3, Prof. S.M. Potdar 4 1,2,3B.E. Student 4 Assitant Professor 1,2,3,4Department of Electronics

More information

Categorical Perception

Categorical Perception Categorical Perception Discrimination for some speech contrasts is poor within phonetic categories and good between categories. Unusual, not found for most perceptual contrasts. Influenced by task, expectations,

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 Introducing Robust speech recognition

More information

Use of Auditory Techniques Checklists As Formative Tools: from Practicum to Student Teaching

Use of Auditory Techniques Checklists As Formative Tools: from Practicum to Student Teaching Use of Auditory Techniques Checklists As Formative Tools: from Practicum to Student Teaching Marietta M. Paterson, Ed. D. Program Coordinator & Associate Professor University of Hartford ACE-DHH 2011 Preparation

More information

Visual IVR. for the. Hearing Impaired

Visual IVR. for the. Hearing Impaired Visual IVR for the Hearing Impaired Presenters Dr. Rhoda Agin Principal and Owner at Rhoda L. Agin, Ph.D. Communication Associates Gali Kovacs Marketing Director, Jacada Intro: The Common IVR Frustrations

More information

Role of F0 differences in source segregation

Role of F0 differences in source segregation Role of F0 differences in source segregation Andrew J. Oxenham Research Laboratory of Electronics, MIT and Harvard-MIT Speech and Hearing Bioscience and Technology Program Rationale Many aspects of segregation

More information

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar,

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar, International Journal of Pure and Applied Mathematics Volume 117 No. 8 2017, 121-125 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: 10.12732/ijpam.v117i8.25

More information

Enhanced Feature Extraction for Speech Detection in Media Audio

Enhanced Feature Extraction for Speech Detection in Media Audio INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Enhanced Feature Extraction for Speech Detection in Media Audio Inseon Jang 1, ChungHyun Ahn 1, Jeongil Seo 1, Younseon Jang 2 1 Media Research Division,

More information

Effect of spectral normalization on different talker speech recognition by cochlear implant users

Effect of spectral normalization on different talker speech recognition by cochlear implant users Effect of spectral normalization on different talker speech recognition by cochlear implant users Chuping Liu a Department of Electrical Engineering, University of Southern California, Los Angeles, California

More information

EEL 6586, Project - Hearing Aids algorithms

EEL 6586, Project - Hearing Aids algorithms EEL 6586, Project - Hearing Aids algorithms 1 Yan Yang, Jiang Lu, and Ming Xue I. PROBLEM STATEMENT We studied hearing loss algorithms in this project. As the conductive hearing loss is due to sound conducting

More information

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice PAGE 65 Performance of Gaussian Mixture Models as a Classifier for Pathological Voice Jianglin Wang, Cheolwoo Jo SASPL, School of Mechatronics Changwon ational University Changwon, Gyeongnam 64-773, Republic

More information

FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION

FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION Arne Leijon and Svante Stadler Sound and Image Processing Lab., School of Electrical Engineering,

More information

A New Paradigm for the Evaluation of Forensic Evidence. Geoffrey Stewart Morrison. p p(e H )

A New Paradigm for the Evaluation of Forensic Evidence. Geoffrey Stewart Morrison. p p(e H ) A New Paradigm for the Evaluation of Forensic Evidence Geoffrey Stewart Morrison p(e H ) p p(e H ) d Abstract In Europe there has been a great deal of concern about the logically correct way to evaluate

More information

A Neural Network Architecture for.

A Neural Network Architecture for. A Neural Network Architecture for Self-Organization of Object Understanding D. Heinke, H.-M. Gross Technical University of Ilmenau, Division of Neuroinformatics 98684 Ilmenau, Germany e-mail: dietmar@informatik.tu-ilmenau.de

More information

Biologically-Inspired Human Motion Detection

Biologically-Inspired Human Motion Detection Biologically-Inspired Human Motion Detection Vijay Laxmi, J. N. Carter and R. I. Damper Image, Speech and Intelligent Systems (ISIS) Research Group Department of Electronics and Computer Science University

More information

SPEECH EMOTION RECOGNITION: ARE WE THERE YET?

SPEECH EMOTION RECOGNITION: ARE WE THERE YET? SPEECH EMOTION RECOGNITION: ARE WE THERE YET? CARLOS BUSSO Multimodal Signal Processing (MSP) lab The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Why study emotion

More information

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Tapani Raiko and Harri Valpola School of Science and Technology Aalto University (formerly Helsinki University of

More information

Noise-Robust Speech Recognition Technologies in Mobile Environments

Noise-Robust Speech Recognition Technologies in Mobile Environments Noise-Robust Speech Recognition echnologies in Mobile Environments Mobile environments are highly influenced by ambient noise, which may cause a significant deterioration of speech recognition performance.

More information

Comparative Analysis of Vocal Characteristics in Speakers with Depression and High-Risk Suicide

Comparative Analysis of Vocal Characteristics in Speakers with Depression and High-Risk Suicide International Journal of Computer Theory and Engineering, Vol. 7, No. 6, December 205 Comparative Analysis of Vocal Characteristics in Speakers with Depression and High-Risk Suicide Thaweewong Akkaralaertsest

More information

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS Mitchel Weintraub and Leonardo Neumeyer SRI International Speech Research and Technology Program Menlo Park, CA, 94025 USA ABSTRACT

More information

Resonating memory traces account for the perceptual magnet effect

Resonating memory traces account for the perceptual magnet effect Resonating memory traces account for the perceptual magnet effect Gerhard Jäger Dept. of Linguistics, University of Tübingen, Germany Introduction In a series of experiments, atricia Kuhl and co-workers

More information

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3 PRESERVING SPECTRAL CONTRAST IN AMPLITUDE COMPRESSION FOR HEARING AIDS Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3 1 University of Malaga, Campus de Teatinos-Complejo Tecnol

More information

Hearing in the Environment

Hearing in the Environment 10 Hearing in the Environment Click Chapter to edit 10 Master Hearing title in the style Environment Sound Localization Complex Sounds Auditory Scene Analysis Continuity and Restoration Effects Auditory

More information

MODULE 6 Communication

MODULE 6 Communication MODULE 6 Communication Communication: The process by which information is transmitted and understood between two or more people. Communication competence: A person s ability to identify appropriate communication

More information

Learning Process. Auditory Training for Speech and Language Development. Auditory Training. Auditory Perceptual Abilities.

Learning Process. Auditory Training for Speech and Language Development. Auditory Training. Auditory Perceptual Abilities. Learning Process Auditory Training for Speech and Language Development Introduction Demonstration Perception Imitation 1 2 Auditory Training Methods designed for improving auditory speech-perception Perception

More information

Lecture 6. Human Factors in Engineering Design

Lecture 6. Human Factors in Engineering Design GE105 Introduction to Engineering Design College of Engineering King Saud University Lecture 6. Human Factors in Engineering Design SPRING 2016 What is Human Factors in Design? Considering information

More information

Overview of the visual cortex. Ventral pathway. Overview of the visual cortex

Overview of the visual cortex. Ventral pathway. Overview of the visual cortex Overview of the visual cortex Two streams: Ventral What : V1,V2, V4, IT, form recognition and object representation Dorsal Where : V1,V2, MT, MST, LIP, VIP, 7a: motion, location, control of eyes and arms

More information

Informal Functional Hearing Evaluation for Students with DeafBlindness

Informal Functional Hearing Evaluation for Students with DeafBlindness Informal Functional Hearing Evaluation for Students with DeafBlindness Presented by Chris Montgomery, M. Ed., TVI Deafblind Education Consultant TSBVI Outreach Programs Alexia Papanicolas, Au.D., CCC-A

More information

Speech and Sound Use in a Remote Monitoring System for Health Care

Speech and Sound Use in a Remote Monitoring System for Health Care Speech and Sound Use in a Remote System for Health Care M. Vacher J.-F. Serignat S. Chaillol D. Istrate V. Popescu CLIPS-IMAG, Team GEOD Joseph Fourier University of Grenoble - CNRS (France) Text, Speech

More information