Recognition & Organization of Speech and Audio
|
|
- Thomas Page
- 5 years ago
- Views:
Transcription
1 Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University Outline Sound organization Background & related work Existing projects Future projects Summary & conclusions talk - Dan Ellis
2 1 Organization of sound mixtures frq/hz time/s Analysis Voice (evil) Rumble Stab Voice (pleasant) Strings Choir Core operation: Converting continuous, scalar signal into discrete, symbolic representation talk - Dan Ellis
3 Positioning sound organization Speech recognition Sound organization Audio processing Image/video analysis Signal processing Pattern recognition Machine learning Artificial intelligence Psychoacoustics & physiology Draws on many techniques Abuts/overlaps various areas talk - Dan Ellis
4 About auditory perception Received waveform is a mixture - two sensors, N signals... - need knowledge-based constraints Psychoacoustics: the study of human sound organization - auditory scene analysis (Bregman 90) Auditory perception is ecologically grounded - scene analysis is preconscious ( illusions) - perceived organization: real-world objects + events (transient) - subjective not canonical (ambiguity) talk - Dan Ellis
5 Key themes for Sound organization - recovering/constructing abstraction hierarchy - at an instant (sources) - along time (segmentation) Scene analysis - need to find attributes according to objects - use attributes to form objects -... plus constraints of knowledge Exploiting large data sets (the ASR lesson) - supervised/labelled: pattern recognition - unsupervised: structure discovery, clustering Special-purpose cases: - speech recognition - source-specific recognizers... within a complete explanation talk - Dan Ellis
6 Applications for sound organization What do people do with their ears? Robots - intelligence requires awareness - Sony s AIBO: dog-hearing Human-computer interface -.. includes knowing when (& why) you ve failed Archive indexing & retrieval - pure audio archives - true multimedia content analysis Content understanding - intelligent classification & summarization Autonomous monitoring Broader structure discovery algorithms talk - Dan Ellis
7 Outline Sound organization Background & related work - Audio coding & compression - Automatic Speech Recognition - Computational Auditory Scene Analysis - Multimedia information retrieval Existing projects Future projects Summary & conclusions talk - Dan Ellis
8 Audio coding & compression Goal is reconstruction, not abstraction But criteria are subjective : want same percept, not same waveform MPEG-Audio: - filterbanks - information-theoretic coding - psychoacoustic masking of quantization noise MPEG-4 Structured Audio - computer music synthesis model - instrument definition + control stream - automatic analysis? talk - Dan Ellis
9 Automatic Speech Recognition (ASR) Standard speech recognition structure: Feature calculation sound D A T A Acoustic model parameters Word models s ah t Language model p("sat" "the","cat") p("saw" "the","cat") Acoustic classifier HMM decoder feature vectors phone probabilities phone / word sequence Understanding/ application... State of the art word-error rates (WERs): - 2% (dictation) - 30% (telephone conversations) Segmentation of speech & nonspeech -... recognizer wouldn t notice! talk - Dan Ellis
10 Spoken document retrieval Text-based IR on ASR transcripts - e.g. news broadcasts (CMU s Informedia, Thisl) Recognition errors are not the limiting factor - TREC-98 results: average precision Weak at word level, but OK over paragraphs - replay the audio, don t show the text! talk - Dan Ellis
11 Computational Auditory Scene Analysis (CASA) Implement psychoacoustic theory? (Brown 92) input mixture Front end signal features (maps) Object formation discrete objects Grouping rules Source groups freq time onset period frq.mod - what are the features? how are they used? Top down constraints are needed (Klassner 96) 3760 Hz 2540 Hz 2230 Hz 1475 Hz 500 Hz 460 Hz 420 Hz Buzzer-Alarm Glass-Clink Phone-Ring Siren-Chirp 2350 Hz 1675 Hz 950 Hz sec TIME sec TIME talk - Dan Ellis
12 Audio Information Retrieval Searching in a database of audio - speech.. use ASR - text annotations.. search them - sound effects library? e.g. Muscle Fish SoundFisher browser - define multiple perceptual feature dimensions - search by proximity in (weighted) feature space Segment feature analysis Sound segment database Feature vectors Seach/ comparison Results Segment feature analysis Query example - features are global for each soundfile, no attempt to separate mixtures - segmentation... talk - Dan Ellis
13 Music analysis Automatic transcription (score recovery) - classic hard problem : can people do it even? - recent success in reduced forms e.g. melody, drum track (Goto 00) Instrument identification - ideas from speaker identification (basic PR) + instrument family hierarchies (Martin 99) Fingerprinting - spot recordings despite noise, distortion - relies on perceptual invariants Music clustering - e.g. music recommendation based on signal - correlate objective features with user ratings? talk - Dan Ellis
14 Multimedia description MPEG-7 Metadata - MPEG is known for audio/video compression standards; - also develop standards for search and indexing MPEG-7 is a standard format for metadata: Well-defined categories for content description Focus is on framework & infrastructure Audio descriptor categories: - from ASR - from computer music community - uses still to emerge talk - Dan Ellis
15 Outline Sound organization Background & related work Existing projects - Acoustic change detection - Robust speech recognition - Nonspeech event detection - Prediction-driven CASA Future projects Summary & conclusions talk - Dan Ellis
16 frq/hz Spectrogram Acoustic change detection (with Williams/Sheffield, Ferreiros/UPMadrid) Approaches: - metric : find instants of maximal change - model-based : best alignment of model set - bayesian : generate models when warranted Dynamism vs. Entropy for 2.5 second segments of speecn/music Speech Music Speech+Music speech music speech+music Entropy time/s Posteriors Dynamism Typically agnostic about underlying problem - use any features, find any changes Good for ASR adaptation, otherwise... talk - Dan Ellis
17 Speech feature combination (with Bilmes/UW, Hermansky/OGI, ICSI) Multistream approaches Input sound Feature 1 calculation Feature 2 calculation Acoustic classifier Acoustic classifier Speech features Posterior combination Phone probabilities HMM decoder Word hypotheses - streams can correct each other big gains Which feature streams to combine? - low mutual information between classifiers indicates complementary streams Conditional Mutual Information between feature streams PLPa PLPb MSGa MSGb PLPa PLPb MSGa MSGb talk - Dan Ellis CMI/bits
18 C0 C1 C2 tn+w h# pcl bcl tcl dcl C0 C1 C2 tn+w h# pcl bcl tcl dcl Tandem speech recognition (with Hermansky, Sharma & Sivadas/OGI, Singh/CMU) Neural net estimates phone posteriors; but Gaussian mixtures model finer detail Combine them! Hybrid Connectionist-HMM ASR Conventional ASR (HTK) Feature calculation Neural net classifier Noway decoder Feature calculation Gauss mix models HTK decoder Ck tn s ah t s ah t Input sound Speech features Phone probabilities Words Input sound Speech features Subword likelihoods Words Tandem modeling Feature calculation Neural net classifier Gauss mix models HTK decoder Ck tn s ah t Input sound Speech features Phone probabilities Subword likelihoods Words 50% relative improvement over GMMs alone - different statistical modeling schemes get different info from same training data talk - Dan Ellis
19 Alarm sound detection Deconstructing sound mixtures freq / Hz time / sec representation of energy in time-frequency - formation of atomic elements - grouping by common properties (onset &c.) 0 Alarm sounds have particular structure - people know them when they hear them - build a generic detector? talk - Dan Ellis
20 input mixture Prediction-driven CASA Data-driven (bottom-up) fails for noisy, ambiguous sounds (most mixtures!) Front end signal features (maps) Object formation discrete objects Grouping rules Source groups input mixture Need top-down constraints: Front end signal features Hypothesis management prediction errors Compare & reconcile hypotheses Noise components Periodic components predicted features Predict & combine - fit vocabulary of generic elements to sound... bottom of a hierarchy? - account for entire scene - driven by prediction failures - pursue alternative hypotheses talk - Dan Ellis
21 f/hz City PDCASA example Horn1 (10/10) Horn2 (5/10) Horn3 (5/10) Horn4 (8/10) Horn5 (10/10) f/hz Wefts f/hz Noise2,Click Weft5 Wefts6,7 Weft8 Wefts9 12 Crash (10/10) f/hz Noise Squeal (6/10) Truck (7/10) db time/s talk - Dan Ellis
22 Outline somewhat concrete provisional Sound organization Background & related work Existing projects Future projects - Meeting Recorder - Missing-data recognition & CASA for ASR - Structure from audio-video features - Speech & speaker recognition - Music organization - Audio archive structure discovery Summary & conclusions talk - Dan Ellis
23 Meeting recorder (with ICSI, UW, SRI, IBM) Microphones in conventional meetings - for transcription/summarization/retrieval - informal, overlapped speech Data collection (ICSI and...): Research: ASR, nonspeech, organization - unprecedented data, new applications talk - Dan Ellis
24 Missing data recognition & CASA (with Barker, Cooke, Green/Sheffield) Missing-data recognition - integrate across don t-know values - perfect mask excellent performance in noise Multi-source decoder - Viterbi search of sound-fragment interpretations "1754" + noise Mask based on stationary noise estimate Mask split into subbands `Grouping' applied within bands: Spectro-Temporal Proximity Common Onset/Offset Multisource Decoder "1754" CASA for masks/fragments - larger fragments quicker search talk - Dan Ellis
25 Structure from audio-video features (Peng Xu) HMM modeling of sports video inter replay start middle end Distribution of camera motion labels, color features - also need within-state sequential structure Add features from audio - could be orthogonal/complementary Audio feature toolkit? - simple feature vectors, boundaries, classes - wealth of potential applications! talk - Dan Ellis
26 Speech & speaker recognition Words are not enough; Confidence-tagged alternate word hypotheses Other useful information: - speaker change detection - speaker characterization - phrasing & timing - prosodic cues to dialog state - laughter, pauses, etc. Integration with other analyses - segmentation for adaptation - nonspeech events to ignore - video-derived information... talk - Dan Ellis
27 Music organization Music is a special case - lots of structure - highly significant Trick is to find meaningful, tractable questions - boundary between speech and music? New (counter-intuitive) approaches? - perceive as whole, not by voice (Scheirer 00) global features for chord structures - generic event cues + local feature classification - more provisional notion of instruments/voices talk - Dan Ellis
28 CASA for audio retrieval Muscle Fish system uses global features: Segment feature analysis Sound segment database Feature vectors Seach/ comparison Results Segment feature analysis Query example Mixtures need elements & objects: Generic element analysis Object formation Continuous audio archive Element representations Objects + properties Seach/ comparison Results Generic element analysis (Object formation) Query example rushing water Word-to-class mapping Symbolic query Properties alone - features calculated on grouped subsets talk - Dan Ellis
29 Audio archive structure discovery What can you do with a large unlabeled training set (e.g. multimedia clips from the web)? - bootstrap learning: look for common patterns - have to learn generalizations in parallel: e.g. self-organizing maps, EM HMMs - post-filtering by humans may find meaning in clusters Associated text annotations provide a very small amount of labeling -.. but for a very large number of examples sufficient to obtain purchase? - maximize label utility through NLP-type operations (expansion, disambiguation etc.) - goal is automatic term-to-feature mapping for term-based content queries talk - Dan Ellis
30 Outline Sound organization Background & related work Existing projects Future projects Summary & conclusions talk - Dan Ellis
31 Summary DOMAINS Broadcast Movies Lectures Meetings Personal recordings Location monitoring Object-based structure discovery & learning Speech recognition Speech characterization Nonspeech recognition Scene analysis Audio-visual integration Music analysis APPLICATIONS Structuring Search Summarization Awareness Understanding talk - Dan Ellis
32 Conclusions Sound is more than just speech! - speech is a special case - most auditory perceivers don t understand speech Object-based analysis is critical - it s what people do - the world presents acoustic mixtures Whole-scene representation is the way - it s what people do - provides mutual constraints of overlap Broad range of approaches for a broad range of phenomena talk - Dan Ellis
Recognition & Organization of Speech and Audio
Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 5 Introducing Tandem modeling
More informationRecognition & Organization of Speech and Audio
Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 Introducing Robust speech recognition
More informationRecognition & Organization of Speech & Audio
Recognition & Organization of Speech & Audio Dan Ellis http://labrosa.ee.columbia.edu/ Outline 1 2 3 Introducing Projects in speech, music & audio Summary overview - Dan Ellis 21-9-28-1 1 Sound organization
More informationGeneral Soundtrack Analysis
General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/
More informationComputational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches
CASA talk - Haskins/NUWC - Dan Ellis 1997oct24/5-1 The importance of auditory illusions for artificial listeners 1 Dan Ellis International Computer Science Institute, Berkeley CA
More informationAutomatic audio analysis for content description & indexing
Automatic audio analysis for content description & indexing Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 Auditory Scene Analysis (ASA) Computational
More informationSound, Mixtures, and Learning
Sound, Mixtures, and Learning Dan Ellis Laboratory for Recognition and Organization of Speech and Audio (LabROSA) Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/
More informationSound, Mixtures, and Learning
Sound, Mixtures, and Learning Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/
More informationAuditory Scene Analysis: phenomena, theories and computational models
Auditory Scene Analysis: phenomena, theories and computational models July 1998 Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 The computational
More informationComputational Auditory Scene Analysis: Principles, Practice and Applications
Computational Auditory Scene Analysis: Principles, Practice and Applications Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 Auditory Scene Analysis
More informationUsing Source Models in Speech Separation
Using Source Models in Speech Separation Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationRobustness, Separation & Pitch
Robustness, Separation & Pitch or Morgan, Me & Pitch Dan Ellis Columbia / ICSI dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. Robustness and Separation 2. An Academic Journey 3. Future COLUMBIA
More informationUsing the Soundtrack to Classify Videos
Using the Soundtrack to Classify Videos Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationLabROSA Research Overview
LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1.
More informationTandem modeling investigations
Tandem modeling investigations Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 What makes Tandem successful? Can we make Tandem better? Does Tandem
More informationModulation and Top-Down Processing in Audition
Modulation and Top-Down Processing in Audition Malcolm Slaney 1,2 and Greg Sell 2 1 Yahoo! Research 2 Stanford CCRMA Outline The Non-Linear Cochlea Correlogram Pitch Modulation and Demodulation Information
More informationOutline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete
Teager Energy and Modulation Features for Speech Applications Alexandros Summariza(on Potamianos and Emo(on Tracking in Movies Dept. of ECE Technical Univ. of Crete Alexandros Potamianos, NatIONAL Tech.
More informationLecture 3: Perception
ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 3: Perception 1. Ear Physiology 2. Auditory Psychophysics 3. Pitch Perception 4. Music Perception Dan Ellis Dept. Electrical Engineering, Columbia University
More informationLecture 9: Speech Recognition: Front Ends
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition: Front Ends 1 2 Recognizing Speech Feature Calculation Dan Ellis http://www.ee.columbia.edu/~dpwe/e682/
More informationComputational Perception /785. Auditory Scene Analysis
Computational Perception 15-485/785 Auditory Scene Analysis A framework for auditory scene analysis Auditory scene analysis involves low and high level cues Low level acoustic cues are often result in
More informationReview of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04
Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04 Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 SPRACH overview The SPRACH Broadcast
More informationCS343: Artificial Intelligence
CS343: Artificial Intelligence Introduction: Part 2 Prof. Scott Niekum University of Texas at Austin [Based on slides created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All materials
More informationUsing Speech Models for Separation
Using Speech Models for Separation Dan Ellis Comprising the work of Michael Mandel and Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY
More informationEECS 433 Statistical Pattern Recognition
EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern
More informationAccessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)
Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH) Matt Huenerfauth Raja Kushalnagar Rochester Institute of Technology DHH Auditory Issues Links Accents/Intonation Listening
More informationSound Analysis Research at LabROSA
Sound Analysis Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationUSING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES
USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES Varinthira Duangudom and David V Anderson School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332
More informationAuditory principles in speech processing do computers need silicon ears?
* with contributions by V. Hohmann, M. Kleinschmidt, T. Brand, J. Nix, R. Beutelmann, and more members of our medical physics group Prof. Dr. rer.. nat. Dr. med. Birger Kollmeier* Auditory principles in
More informationHCS 7367 Speech Perception
Babies 'cry in mother's tongue' HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Babies' cries imitate their mother tongue as early as three days old German researchers say babies begin to pick up
More informationAcoustic Signal Processing Based on Deep Neural Networks
Acoustic Signal Processing Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu, Yanhui Tu, Qing Wang, Tian Gao, Jun Du, LiRong Dai Outline
More informationAuditory scene analysis in humans: Implications for computational implementations.
Auditory scene analysis in humans: Implications for computational implementations. Albert S. Bregman McGill University Introduction. The scene analysis problem. Two dimensions of grouping. Recognition
More informationAdvanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment
DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment SBIR 99.1 TOPIC AF99-1Q3 PHASE I SUMMARY
More informationRobust Neural Encoding of Speech in Human Auditory Cortex
Robust Neural Encoding of Speech in Human Auditory Cortex Nai Ding, Jonathan Z. Simon Electrical Engineering / Biology University of Maryland, College Park Auditory Processing in Natural Scenes How is
More informationSound Interfaces Engineering Interaction Technologies. Prof. Stefanie Mueller HCI Engineering Group
Sound Interfaces 6.810 Engineering Interaction Technologies Prof. Stefanie Mueller HCI Engineering Group what is sound? if a tree falls in the forest and nobody is there does it make sound?
More informationAuditory Scene Analysis
1 Auditory Scene Analysis Albert S. Bregman Department of Psychology McGill University 1205 Docteur Penfield Avenue Montreal, QC Canada H3A 1B1 E-mail: bregman@hebb.psych.mcgill.ca To appear in N.J. Smelzer
More informationAnalysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System
Analysis of Recognition Techniques for use in a Sound Recognition System Michael Cowling, Member, IEEE and Renate Sitte, Member, IEEE Griffith University Faculty of Engineering & Information Technology
More informationSound Texture Classification Using Statistics from an Auditory Model
Sound Texture Classification Using Statistics from an Auditory Model Gabriele Carotti-Sha Evan Penn Daniel Villamizar Electrical Engineering Email: gcarotti@stanford.edu Mangement Science & Engineering
More informationCS/NEUR125 Brains, Minds, and Machines. Due: Friday, April 14
CS/NEUR125 Brains, Minds, and Machines Assignment 5: Neural mechanisms of object-based attention Due: Friday, April 14 This Assignment is a guided reading of the 2014 paper, Neural Mechanisms of Object-Based
More informationNoise-Robust Speech Recognition Technologies in Mobile Environments
Noise-Robust Speech Recognition echnologies in Mobile Environments Mobile environments are highly influenced by ambient noise, which may cause a significant deterioration of speech recognition performance.
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 BACKGROUND Speech is the most natural form of human communication. Speech has also become an important means of human-machine interaction and the advancement in technology has
More informationAdaptation of Classification Model for Improving Speech Intelligibility in Noise
1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise) (Regular Paper) 23 4, 2018 7 (JBE Vol. 23, No. 4, July 2018) https://doi.org/10.5909/jbe.2018.23.4.511
More informationHCS 7367 Speech Perception
Long-term spectrum of speech HCS 7367 Speech Perception Connected speech Absolute threshold Males Dr. Peter Assmann Fall 212 Females Long-term spectrum of speech Vowels Males Females 2) Absolute threshold
More information! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics
2/14/18 Can hear whistle? Lecture 5 Psychoacoustics Based on slides 2009--2018 DeHon, Koditschek Additional Material 2014 Farmer 1 2 There are sounds we cannot hear Depends on frequency Where are we on
More informationGender Based Emotion Recognition using Speech Signals: A Review
50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department
More informationAnalysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information
Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan Emotion
More informationA Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar
A Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar Perry C. Hanavan, AuD Augustana University Sioux Falls, SD August 15, 2018 Listening in Noise Cocktail Party Problem
More informationRethinking Cognitive Architecture!
Rethinking Cognitive Architecture! Reconciling Uniformity and Diversity via Graphical Models! Paul Rosenbloom!!! 1/25/2010! Department of Computer Science &! Institute for Creative Technologies! The projects
More informationCONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS
CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS Mitchel Weintraub and Leonardo Neumeyer SRI International Speech Research and Technology Program Menlo Park, CA, 94025 USA ABSTRACT
More informationSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
CPC - G10L - 2017.08 G10L SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING processing of speech or voice signals in general (G10L 25/00);
More informationAcoustic Sensing With Artificial Intelligence
Acoustic Sensing With Artificial Intelligence Bowon Lee Department of Electronic Engineering Inha University Incheon, South Korea bowon.lee@inha.ac.kr bowon.lee@ieee.org NVIDIA Deep Learning Day Seoul,
More informationHearing the Universal Language: Music and Cochlear Implants
Hearing the Universal Language: Music and Cochlear Implants Professor Hugh McDermott Deputy Director (Research) The Bionics Institute of Australia, Professorial Fellow The University of Melbourne Overview?
More informationTandem acoustic modeling: Neural nets for mainstream ASR?
Tandem acoutic modeling: for maintream ASR? Dan Elli International Computer Science Intitute Berkeley CA dpwe@ici.berkeley.edu Outline 2 3 Tandem acoutic modeling Inide Tandem ytem: What going on? Future
More informationSource and Description Category of Practice Level of CI User How to Use Additional Information. Intermediate- Advanced. Beginner- Advanced
Source and Description Category of Practice Level of CI User How to Use Additional Information Randall s ESL Lab: http://www.esllab.com/ Provide practice in listening and comprehending dialogue. Comprehension
More informationAuditory gist perception and attention
Auditory gist perception and attention Sue Harding Speech and Hearing Research Group University of Sheffield POP Perception On Purpose Since the Sheffield POP meeting: Paper: Auditory gist perception:
More informationPresence and Perception: theoretical links & empirical evidence. Edwin Blake
Presence and Perception: theoretical links & empirical evidence Edwin Blake edwin@cs.uct.ac.za This Talk 2 Perception Bottom-up Top-down Integration Presence Bottom-up Top-down BIPs Presence arises from
More informationSpeech as HCI. HCI Lecture 11. Human Communication by Speech. Speech as HCI(cont. 2) Guest lecture: Speech Interfaces
HCI Lecture 11 Guest lecture: Speech Interfaces Hiroshi Shimodaira Institute for Communicating and Collaborative Systems (ICCS) Centre for Speech Technology Research (CSTR) http://www.cstr.ed.ac.uk Thanks
More informationDiscrete Signal Processing
1 Discrete Signal Processing C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.cs.nctu.edu.tw/~cmliu/courses/dsp/ ( Office: EC538 (03)5731877 cmliu@cs.nctu.edu.tw
More informationHow do you design an intelligent agent?
Intelligent Agents How do you design an intelligent agent? Definition: An intelligent agent perceives its environment via sensors and acts rationally upon that environment with its effectors. A discrete
More informationAuditory Scene Analysis. Dr. Maria Chait, UCL Ear Institute
Auditory Scene Analysis Dr. Maria Chait, UCL Ear Institute Expected learning outcomes: Understand the tasks faced by the auditory system during everyday listening. Know the major Gestalt principles. Understand
More informationOnline Speaker Adaptation of an Acoustic Model using Face Recognition
Online Speaker Adaptation of an Acoustic Model using Face Recognition Pavel Campr 1, Aleš Pražák 2, Josef V. Psutka 2, and Josef Psutka 2 1 Center for Machine Perception, Department of Cybernetics, Faculty
More informationBroadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop
Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop Lin-Ching Chang Department of Electrical Engineering and Computer Science School of Engineering Work Experience 09/12-present,
More informationLecture 4: Auditory Perception. Why study perception?
EE E682: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1 2 3 4 5 6 Motivation: Why & how Auditory physiology Psychophysics: Detection & discrimination Pitch perception Speech perception
More informationgroup by pitch: similar frequencies tend to be grouped together - attributed to a common source.
Pattern perception Section 1 - Auditory scene analysis Auditory grouping: the sound wave hitting out ears is often pretty complex, and contains sounds from multiple sources. How do we group sounds together
More informationHearIntelligence by HANSATON. Intelligent hearing means natural hearing.
HearIntelligence by HANSATON. HearIntelligence by HANSATON. Intelligent hearing means natural hearing. Acoustic environments are complex. We are surrounded by a variety of different acoustic signals, speech
More informationProviding Effective Communication Access
Providing Effective Communication Access 2 nd International Hearing Loop Conference June 19 th, 2011 Matthew H. Bakke, Ph.D., CCC A Gallaudet University Outline of the Presentation Factors Affecting Communication
More informationHearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds
Hearing Lectures Acoustics of Speech and Hearing Week 2-10 Hearing 3: Auditory Filtering 1. Loudness of sinusoids mainly (see Web tutorial for more) 2. Pitch of sinusoids mainly (see Web tutorial for more)
More informationWhat you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for
What you re in for Speech processing schemes for cochlear implants Stuart Rosen Professor of Speech and Hearing Science Speech, Hearing and Phonetic Sciences Division of Psychology & Language Sciences
More informationSpeech Compression for Noise-Corrupted Thai Dialects
American Journal of Applied Sciences 9 (3): 278-282, 2012 ISSN 1546-9239 2012 Science Publications Speech Compression for Noise-Corrupted Thai Dialects Suphattharachai Chomphan Department of Electrical
More informationPower Instruments, Power sources: Trends and Drivers. Steve Armstrong September 2015
Power Instruments, Power sources: Trends and Drivers Steve Armstrong September 2015 Focus of this talk more significant losses Severe Profound loss Challenges Speech in quiet Speech in noise Better Listening
More informationSpeech Processing / Speech Translation Case study: Transtac Details
Speech Processing 11-492/18-492 Speech Translation Case study: Transtac Details Phraselator: One Way Translation Commercial System VoxTec Rapid deployment Modules of 500ish utts Transtac: Two S2S System
More informationSPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)
SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) Virendra Chauhan 1, Shobhana Dwivedi 2, Pooja Karale 3, Prof. S.M. Potdar 4 1,2,3B.E. Student 4 Assitant Professor 1,2,3,4Department of Electronics
More informationFundamentals of Cognitive Psychology, 3e by Ronald T. Kellogg Chapter 2. Multiple Choice
Multiple Choice 1. Which structure is not part of the visual pathway in the brain? a. occipital lobe b. optic chiasm c. lateral geniculate nucleus *d. frontal lobe Answer location: Visual Pathways 2. Which
More informationPrincess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) Computer Science Department
Princess Nora University Faculty of Computer & Information Systems 1 ARTIFICIAL INTELLIGENCE (CS 370D) Computer Science Department (CHAPTER-3) INTELLIGENT AGENTS (Course coordinator) CHAPTER OUTLINE What
More informationDate: April 19, 2017 Name of Product: Cisco Spark Board Contact for more information:
Date: April 19, 2017 Name of Product: Cisco Spark Board Contact for more information: accessibility@cisco.com Summary Table - Voluntary Product Accessibility Template Criteria Supporting Features Remarks
More informationHigh-level Vision. Bernd Neumann Slides for the course in WS 2004/05. Faculty of Informatics Hamburg University Germany
High-level Vision Bernd Neumann Slides for the course in WS 2004/05 Faculty of Informatics Hamburg University Germany neumann@informatik.uni-hamburg.de http://kogs-www.informatik.uni-hamburg.de 1 Contents
More informationBinaural processing of complex stimuli
Binaural processing of complex stimuli Outline for today Binaural detection experiments and models Speech as an important waveform Experiments on understanding speech in complex environments (Cocktail
More informationBinaural Hearing for Robots Introduction to Robot Hearing
Binaural Hearing for Robots Introduction to Robot Hearing 1Radu Horaud Binaural Hearing for Robots 1. Introduction to Robot Hearing 2. Methodological Foundations 3. Sound-Source Localization 4. Machine
More informationInfant Hearing Development: Translating Research Findings into Clinical Practice. Auditory Development. Overview
Infant Hearing Development: Translating Research Findings into Clinical Practice Lori J. Leibold Department of Allied Health Sciences The University of North Carolina at Chapel Hill Auditory Development
More informationEEL-5840 Elements of {Artificial} Machine Intelligence
Menu Introduction Syllabus Grading: Last 2 Yrs Class Average 3.55; {3.7 Fall 2012 w/24 students & 3.45 Fall 2013} General Comments Copyright Dr. A. Antonio Arroyo Page 2 vs. Artificial Intelligence? DEF:
More informationBMMC (UG SDE) IV SEMESTER
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION BMMC (UG SDE) IV SEMESTER GENERAL COURSE IV COMMON FOR Bsc ELECTRONICS, COMPUTER SCIENCE, INSTRUMENTATION & MULTIMEDIA BASICS OF AUDIO & VIDEO MEDIA QUESTION
More informationSPEECH ANALYSIS 3/3 PERCEPTION
NATURAL LANGUAGE PROCESSING Prof. L. Sbattella SPEECH ANALYSIS 3/3 PERCEPTION Dott. Ing. Sonia Cenceschi Sonia.cenceschi@polimi.it Psychoacoustics and perception Philip G. Zimbardo, Richard J. Gerrig,
More informationJ Jeffress model, 3, 66ff
Index A Absolute pitch, 102 Afferent projections, inferior colliculus, 131 132 Amplitude modulation, coincidence detector, 152ff inferior colliculus, 152ff inhibition models, 156ff models, 152ff Anatomy,
More informationAuditory Scene Analysis in Humans and Machines
Auditory Scene Analysis in Humans and Machines Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationCarnegie Mellon University Annual Progress Report: 2011 Formula Grant
Carnegie Mellon University Annual Progress Report: 2011 Formula Grant Reporting Period January 1, 2012 June 30, 2012 Formula Grant Overview The Carnegie Mellon University received $943,032 in formula funds
More informationIntroduction to Computational Neuroscience
Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single
More informationA Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China
A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some
More informationUnderstanding Users. - cognitive processes. Unit 3
Understanding Users - cognitive processes Unit 3 Why do we need to understand users? Interacting with technology involves a number of cognitive processes We need to take into account Characteristic & limitations
More informationThe role of periodicity in the perception of masked speech with simulated and real cochlear implants
The role of periodicity in the perception of masked speech with simulated and real cochlear implants Kurt Steinmetzger and Stuart Rosen UCL Speech, Hearing and Phonetic Sciences Heidelberg, 09. November
More informationAgents and Environments
Agents and Environments Berlin Chen 2004 Reference: 1. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Chapter 2 AI 2004 Berlin Chen 1 What is an Agent An agent interacts with its
More informationERA: Architectures for Inference
ERA: Architectures for Inference Dan Hammerstrom Electrical And Computer Engineering 7/28/09 1 Intelligent Computing In spite of the transistor bounty of Moore s law, there is a large class of problems
More informationRequirements for Maintaining Web Access for Hearing-Impaired Individuals
Requirements for Maintaining Web Access for Hearing-Impaired Individuals Daniel M. Berry 2003 Daniel M. Berry WSE 2001 Access for HI Requirements for Maintaining Web Access for Hearing-Impaired Individuals
More informationSpeech Enhancement Based on Deep Neural Networks
Speech Enhancement Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu and Jun Du at USTC 1 Outline and Talk Agenda In Signal Processing Letter,
More informationSUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf
2th European Signal Processing Conference (EUSIPCO 212) Bucharest, Romania, August 27-31, 212 SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS Christina Leitner and Franz Pernkopf
More informationTowards Accessible Conversations in a Mobile Context for People who are Deaf or Hard of Hearing
Towards Accessible Conversations in a Mobile Context for People who are Deaf or Hard of Hearing Dhruv Jain, Rachel Franz, Leah Findlater, Jackson Cannon, Raja Kushalnagar, and Jon Froehlich University
More informationWearable Video Monitoring of People with Age Dementia: Video Indexing at the Service of Healthcare
Wearable Video Monitoring of People with Age Dementia: Video Indexing at the Service of Healthcare Rémi Megret, IMS Daniel Szolgay, Jenny Benois-Pineau, LaBRI Philippe Joly, Julien Pinquier, IRIT Jean-François
More informationAudiovisual to Sign Language Translator
Technical Disclosure Commons Defensive Publications Series July 17, 2018 Audiovisual to Sign Language Translator Manikandan Gopalakrishnan Follow this and additional works at: https://www.tdcommons.org/dpubs_series
More information1 Pattern Recognition 2 1
1 Pattern Recognition 2 1 3 Perceptrons by M.L. Minsky and S.A. Papert (1969) Books: 4 Pattern Recognition, fourth Edition (Hardcover) by Sergios Theodoridis, Konstantinos Koutroumbas Publisher: Academic
More informationSpeech to Text Wireless Converter
Speech to Text Wireless Converter Kailas Puri 1, Vivek Ajage 2, Satyam Mali 3, Akhil Wasnik 4, Amey Naik 5 And Guided by Dr. Prof. M. S. Panse 6 1,2,3,4,5,6 Department of Electrical Engineering, Veermata
More informationMULTI-CHANNEL COMMUNICATION
INTRODUCTION Research on the Deaf Brain is beginning to provide a new evidence base for policy and practice in relation to intervention with deaf children. This talk outlines the multi-channel nature of
More informationC H A N N E L S A N D B A N D S A C T I V E N O I S E C O N T R O L 2
C H A N N E L S A N D B A N D S Audibel hearing aids offer between 4 and 16 truly independent channels and bands. Channels are sections of the frequency spectrum that are processed independently by the
More information