Oregon Graduate Institute of Science and Technology,

Similar documents
Research Article Automatic Speaker Recognition for Mobile Forensic Applications

Robustness, Separation & Pitch

Recognition & Organization of Speech & Audio

Robust Speech Detection for Noisy Environments

General Soundtrack Analysis

Speech recognition in noisy environments: A survey

Recognition & Organization of Speech and Audio

The Unifi-EV Protocol for Evalita 2009

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information

Acoustic Signal Processing Based on Deep Neural Networks

Speech as HCI. HCI Lecture 11. Human Communication by Speech. Speech as HCI(cont. 2) Guest lecture: Speech Interfaces

Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04

HCS 7367 Speech Perception

PMR5406 Redes Neurais e Lógica Fuzzy. Aula 5 Alguns Exemplos

COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA

Auditory gist perception and attention

Enhancement of Reverberant Speech Using LP Residual Signal

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

A New Paradigm for the Evaluation of Forensic Evidence

Recognition & Organization of Speech and Audio

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

Gender Based Emotion Recognition using Speech Signals: A Review

CHAPTER 1 INTRODUCTION

The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments

Sound, Mixtures, and Learning

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

Using Speech Models for Separation

Lecture 9: Speech Recognition: Front Ends

Codebook driven short-term predictor parameter estimation for speech enhancement

TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY

Modulation and Top-Down Processing in Audition

Tandem modeling investigations

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

International Forensic Science & Forensic Medicine Conference Naif Arab University for Security Sciences Riyadh Saudi Arabia

Lecturer: T. J. Hazen. Handling variability in acoustic conditions. Computing and applying confidence scores

Audiovisual to Sign Language Translator

Errol Davis Director of Research and Development Sound Linked Data Inc. Erik Arisholm Lead Engineer Sound Linked Data Inc.

Acoustic-Labial Speaker Verication. (luettin, genoud,

Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice

Springer. Springer Handbook of Auditory Research. Series Editors: Richard R. Fay and Arthur N. Popper

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

PATTERN ELEMENT HEARING AIDS AND SPEECH ASSESSMENT AND TRAINING Adrian FOURCIN and Evelyn ABBERTON

Kathy Nico Carbonell Speech, Language & Hearing Sciences, University of Florida P.O. Box University of Florida Gainesville, FL 32610

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches

Hearing Impaired K 12

GfK Verein. Detecting Emotions from Voice

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

Jing Shen CURRICULUM VITAE Contact Information Education Academic and Research Experience

Robust Neural Encoding of Speech in Human Auditory Cortex

Auditory scene analysis in humans: Implications for computational implementations.

SVM-based Discriminative Accumulation Scheme for Place Recognition

A Lip Reading Application on MS Kinect Camera

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist

Movement and Memory. Undergraduate degree. Technology & Movement: New Approaches to Understanding Change. Graduate degree

SPEECH PERCEPTION IN A 3-D WORLD

HCS 7367 Speech Perception

Computational Perception /785. Auditory Scene Analysis

Evaluation of the neurological state of people with Parkinson s disease using i-vectors

Sound Localization PSY 310 Greg Francis. Lecture 31. Audition

SRIRAM GANAPATHY. Indian Institute of Science, Phone: +91-(80) Bangalore, India, Fax: +91-(80)

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

LabROSA Research Overview

Motion Control for Social Behaviours

Speech, Language, and Hearing Sciences. Discovery with delivery as WE BUILD OUR FUTURE

Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone

Auditory Scene Analysis

Human-Robotic Agent Speech Interaction

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

I>. U8.!E+S (contextual tuning theory) '/ # +8IL

Sensory Cue Integration

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

Categorical Perception

Recognition & Organization of Speech and Audio

Use of Auditory Techniques Checklists As Formative Tools: from Practicum to Student Teaching

Visual IVR. for the. Hearing Impaired

Role of F0 differences in source segregation

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar,

Enhanced Feature Extraction for Speech Detection in Media Audio

Effect of spectral normalization on different talker speech recognition by cochlear implant users

EEL 6586, Project - Hearing Aids algorithms

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION

A New Paradigm for the Evaluation of Forensic Evidence. Geoffrey Stewart Morrison. p p(e H )

A Neural Network Architecture for.

Biologically-Inspired Human Motion Detection

SPEECH EMOTION RECOGNITION: ARE WE THERE YET?

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

Noise-Robust Speech Recognition Technologies in Mobile Environments

Comparative Analysis of Vocal Characteristics in Speakers with Depression and High-Risk Suicide

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

Resonating memory traces account for the perceptual magnet effect

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

Hearing in the Environment

MODULE 6 Communication

Learning Process. Auditory Training for Speech and Language Development. Auditory Training. Auditory Perceptual Abilities.

Lecture 6. Human Factors in Engineering Design

Overview of the visual cortex. Ventral pathway. Overview of the visual cortex

Informal Functional Hearing Evaluation for Students with DeafBlindness

Speech and Sound Use in a Remote Monitoring System for Health Care

Transcription:

SPEAKER RECOGNITION AT OREGON GRADUATE INSTITUTE June & 6, 997 Sarel van Vuuren and Narendranath Malayath Hynek Hermansky and Pieter Vermeulen, Oregon Graduate Institute, Portland, Oregon

Oregon Graduate Institute. Speaker Recognition at OGI Research Group Goals. Competitive System Architecture Results 3. Initial Robust System Architecture Preliminary Results and Conclusions Planned Extensions

People { Faculty: Hynek Hermansky, Pieter Vermeulen { Post Doc: Nobu Kanedera, Carlos Avendano { PhD Students: Sarel van Vuuren, Sangita Tibrewala, Narendranath Malayath Speech processing by emulating relevant properties of speech perception Collaboration with { CSLU at OGI { ICSI Berkeley { IIT Madras { IDIAP Martigny { KTH Stockholm

Activities { Speaker identication { Acoustic modeling for ASR { Enhancement of degraded speech and speech processing for handicapped { Human speech perception

Speaker Recognition at OGI Speech Signal { linguistic message { speaker characteristics { environment Task { nd out how these information sources are coded into the signal Applications { speaker ID { speaker independent ASR { voice mimic

Requirements of a Speaker Verication System Invariant to channel Invariant to session Invariant to noise Minimal training data Minimal verication data Adapt to speaker styles

Goals Be familiar with state of the art { Build an up to date competitive system following the state of the art { Analyze and understand abilities and limitations { Contribute to research system { Incorporate ideas from research system

Goals Research novel ideas { Knowledge driven { Analyze and understand { Report results { Contribute to state of the art system { Incorporate knowledge from state of the art system Address robustness { Invariance vs modeling { Channels and noise Address data requirements { Training { Verication

Initial Robust System Preprocessing Similar Representation Rep. Rep. L+E Speaker Specific Mapping L+S+E + L+S +E - Distance Information Sources L:Linguistic S:Speaker E:Environment Frame Integration Features Likelihood Estimator Residue invariant to extraneous information and noise Preprocessing: segmentation - such as silence removal, voiced segments Representation: diering speaker information - such as low order PLP vs high order PLP Speaker Specic Mapping: - such as Neural Net or Pseudo Inverse

Initial Robust System Preprocessing Similar Representation Rep. Rep. L+E Speaker Specific Mapping L+S+E + L+S +E - Distance Information Sources L:Linguistic S:Speaker E:Environment Frame Integration Features Likelihood Estimator Speaker Specic Distance Measure: - Euclidean, likelihood estimator, Bhattacharyya Frame Integrator: - average, voting Likelihood Estimator Adding other information (pitch, formants)

Initial Robust Implementation PLP Representation Remove Silence PLP-7 PLP-4 Speaker Specific NN + - Euclidean Frame Average Preprocessing: Silence deletion Representation: PLP-7 vs PLP-4 Speaker Specic Mapping: Neural Net Distance Measure: Euclidean Frame Integrator: Average Likelihood Estimator: None

Preliminary Studies Map from speaker independent to speaker rich representation Evidence of discrimination Evidence of low data requirements for verication No handset robustness - mapping not invariant due to training methodology

Results { GMM baseline DET curve: handset training; 3 sec test ; female; training handset 0 mdcf 0.047 hdcf 0.00 eer 9.80 % mdcf (.8,9.4) hdcf (.7,4.) mdcf 0.08 eer 73 0 DET curve: handset training; 3 sec test ; female; non training handset 0 mdcf 0.064 hdcf 0.068 eer 4.86 % mdcf (.9,4.) hdcf (.7,4.) mdcf 0.073 eer 0.394 0

Results { GMM baseline DET curve: handset training; 0 sec test; female; training handset 0 mdcf 0.09 hdcf 0.030 eer.04 % mdcf (.3,.7) hdcf (.,7.4) mdcf 0.009 eer 07 0 DET curve: handset training; 0 sec test; female; non training handset 0 mdcf 0.048 hdcf 0.00 eer 9.60 % mdcf (.,3.8) hdcf (.,37.8) mdcf 0.030 eer 0.3 0

Results { GMM baseline DET curve: handset training; 30 sec test; female; training handset 0 mdcf 0.0 hdcf 0.06 eer.80 % mdcf (0.6,8.7) hdcf (0.7,9.0) mdcf 0.0 eer 7 0 DET curve: handset training; 30 sec test; female; non training handset 0 mdcf 0.034 hdcf 0.037 eer 6.9 % mdcf (.,9.) hdcf (0.7,30.0) mdcf 0.09 eer 94 0

Results { PLP system DET curve: handset training; 3 sec test ; female; training handset 0 mdcf 0.086 eer 9.0 % mdcf (.9,0.0) mdcf 0.84 eer 0.93 0 DET curve: handset training; 3 sec test ; female; non training handset 0 mdcf 0.098 eer 33.3 % mdcf (0.8,0.0) mdcf 0.830 eer 0.960 0

Results { PLP system DET curve: handset training; 0 sec test; female; training handset 0 mdcf 0.06 eer.69 % mdcf (.4,.9) mdcf 0.88 eer 0.933 0 DET curve: handset training; 0 sec test; female; non training handset 0 mdcf 0.09 eer 30.03 % mdcf (0.9,0.0) mdcf 0.87 eer 0.99 0

Results { PLP system DET curve: handset training; 30 sec test; female; training handset 0 mdcf 0.069 eer 4.8 % mdcf (.6,4.6) mdcf 0.88 eer 0.93 0 DET curve: handset training; 30 sec test; female; non training handset 0 mdcf 0.093 eer 9.4 % mdcf (0.8,0.0) mdcf 0.83 eer 0.99 0

Results { Subspace system DET curve: handset training; 3 sec test ; female; training handset 0 mdcf 0.07 eer.4 % mdcf (.,0.0) mdcf 0.796 eer 0.890 0 DET curve: handset training; 3 sec test ; female; non training handset eer 3.8 % 0 0

Results { Subspace system DET curve: handset training; 0 sec test; female; training handset 0 mdcf 0.060 eer.76 % mdcf (.,39.) mdcf 0.809 eer 0.88 0 DET curve: handset training; 0 sec test; female; non training handset eer 30.96 % 0 0

Results { Subspace system DET curve: handset training; 30 sec test; female; training handset 0 mdcf 0.0 eer 9.4 % mdcf (.6,38.7) mdcf 0.809 eer 0.877 0 DET curve: handset training; 30 sec test; female; non training handset eer 9.4 % 0 0

Future Work: Speaker Verication Understand each component Preprocessing Representation Environment Invariant Mapping Distance Measure