Recognition & Organization of Speech & Audio
|
|
- Darlene Amberly Little
- 5 years ago
- Views:
Transcription
1 Recognition & Organization of Speech & Audio Dan Ellis Outline Introducing Projects in speech, music & audio Summary overview - Dan Ellis
2 1 Sound organization frq/hz time/s Analysis Voice (evil) Rumble Stab Voice (pleasant) Strings Choir Central operation: - continuous sound mixture distinct objects & events Perceptual impression is very strong - but hard to see in signal overview - Dan Ellis
3 Bregman s lake Imagine two narrow channels dug up from the edge of a lake, with handkerchiefs stretched across each one. Looking only at the motion of the handkerchiefs, you are to answer questions such as: How many boats are there on the lake and where are they? (after Bregman 9) Received waveform is a mixture - two sensors, N signals... Disentangling mixtures as primary goal - perfect solution is not possible - need knowledge-based constraints overview - Dan Ellis
4 The information in sound freq / Hz Steps 1 Steps time / s A sense of hearing is evolutionarily useful - gives organisms relevant information Auditory perception is ecologically grounded - scene analysis is preconscious ( illusions) - special-purpose processing reflects natural scene properties - subjective not canonical (ambiguity) overview - Dan Ellis
5 Key themes for Sound organization: construct hierarchy - at an instant (sources) - along time (segmentation) Scene analysis - find attributes according to objects - use attributes to form objects -... plus constraints of knowledge Exploiting large data sets (the ASR lesson) - supervised/labeled: pattern recognition - unsupervised: structure discovery, clustering Special cases: - speech recognition - other source-specific recognizers... within a complete explanation overview - Dan Ellis
6 Outline Introducing Projects in speech, music & audio - Tandem speech recognition - Meeting recorder speech analysis - Musical information extraction - Alarm sound detection Summary overview - Dan Ellis
7 Automatic Speech Recognition (ASR) Standard speech recognition structure: Feature calculation sound D A T A Acoustic model parameters Word models s ah t Language model p("sat" "the","cat") p("saw" "the","cat") Acoustic classifier HMM decoder feature vectors phone probabilities phone / word sequence Understanding/ application... State of the art word-error rates (WERs): - 2% (dictation) - 3% (telephone conversations) Can use multiple streams... overview - Dan Ellis
8 C C1 C2 Ck tn tn+w h# pcl bcl tcl dcl C C1 C2 Ck tn tn+w h# pcl bcl tcl dcl Tandem speech recognition (with Manuel Reyes, ICSI, OGI, CMU) Neural net estimates phone posteriors; but Gaussian mixtures model finer detail Combine them! Hybrid Connectionist-HMM ASR Conventional ASR (HTK) Feature calculation Neural net classifier Noway decoder Feature calculation Gauss mix models HTK decoder s ah t s ah t Input sound Speech features Phone probabilities Words Input sound Speech features Subword likelihoods Words Tandem modeling Feature calculation Neural net classifier Gauss mix models HTK decoder s ah t Input sound Speech features Phone probabilities Subword likelihoods Words Train net, then train GMM on net output - GMM is ignorant of net output meaning overview - Dan Ellis
9 Tandem system results: Aurora digits 1 WER as a function of SNR for various Aurora99 systems WER / % (log scale) HTK GMM baseline Hybrid connectionist Tandem Tandem + PC Average WER ratio to baseline: HTK GMM: 1% Hybrid: 84.6% Tandem: 64.5% Tandem + PC: 47.2% SNR / db (averaged over 4 noises) 2 clean Aurora 2 Eurospeech 21 Evaluation improvement % - Multicondition Rel Avg. rel. improvement Columbia Philips UPC Barcelona Bell s IBM Motorola 1 Motorola 2 Nijmegen ICSI/OGI/Qualcomm ATR/Griffith AT&T Alcatel Siemens UCLA Microsoft Slovenia Granada overview - Dan Ellis
10 The Meeting Recorder project (with ICSI, UW, SRI, IBM) Microphones in conventional meetings - for summarization/retrieval/behavior analysis - informal, overlapped speech Data collection (ICSI, UW,...): - 1 hours collected, ongoing transcription - headsets + tabletop + PDA overview - Dan Ellis
11 Crosstalk cancellation Baseline speaker activity detection is hard: Spkr A speaker active Spkr B speaker B cedes floor Spkr C interruptions Spkr D breath noise Spkr E crosstalk Table top time / secs level/db 4 2 Noisy crosstalk model: m = C s + n Estimate subband C Aa from A s peak energy -... including pure delay (1 ms frames) -... then linear inversion overview - Dan Ellis
12 PDA-based speaker change detection Goal: small conference-tabletop device Speaker turns from PDA mock-up signals? 8 pda.aif: excerpt with 512-pt xcorr, 8% max thresh Morgan what's this one here this this one has a couple cheesy electrets oh that's connected too or that's connected two- both Adam Dan yep yeah both channels yeah that's that's the dummy dummy P.D.A. yeah it is lag/ms time/s SCD algo on spectral + interaural features - average spectral + per-channel ITD, φ overview - Dan Ellis
13 Music analysis: Lyrics extraction (with Adam Berenzweig) Vocal content is highly salient, useful for retrieval Can we find the singing? Use an ASR classifier: speech (trnset #58) music (no vocals #1) singing (vocals # s) Spectrogram Posteriogram freq / khz phone class 4 2 singing 1 2 time / sec 1 2 time / sec 1 2 time / sec Frame error rate ~2% for segmentation based on posterior-feature statistics Lyric segmentation + transcribed lyrics training data for lyrics ASR... overview - Dan Ellis
14 Music analysis: Structure recovery (with Rob Turetsky) Structure recovery by similarity matrices (after Foote) - similarity distance measure? - segmentation & repetition structure - interpretation at different scales: notes, phrases, movements - incorporating musical knowledge: theme similarity overview - Dan Ellis
15 Alarm sound detection Alarm sounds have particular structure - people know them when they hear them Isolate alarms in sound mixtures freq / Hz time / sec representation of energy in time-frequency - formation of atomic elements - grouping by common properties (onset &c.) - classify by attributes... Key: recognize despite background overview - Dan Ellis
16 The Machine listener Goal: An auditory system for machines - use same environmental information as people Aspects: - recognize spoken commands (but not others) - track acoustic channel quality (for responses) - categorize environment (conversation, crowd...) Scenarios - personal listener summary of your day - autonomous robots: need awareness overview - Dan Ellis
17 Outline Introducing Projects in speech, music & audio Summary overview - Dan Ellis
18 Summary DOMAINS Broadcast Movies Lectures Meetings Personal recordings Location monitoring Object-based structure discovery & learning Speech recognition Speech characterization Nonspeech recognition Scene analysis Audio-visual integration Music analysis APPLICATIONS Structuring Search Summarization Awareness Understanding overview - Dan Ellis
Recognition & Organization of Speech and Audio
Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 Introducing Robust speech recognition
More informationGeneral Soundtrack Analysis
General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/
More informationRecognition & Organization of Speech and Audio
Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 5 Introducing Tandem modeling
More informationRecognition & Organization of Speech and Audio
Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University Outline 1 2 3 4 5 Sound organization Background & related work Existing projects
More informationSound, Mixtures, and Learning
Sound, Mixtures, and Learning Dan Ellis Laboratory for Recognition and Organization of Speech and Audio (LabROSA) Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/
More informationRobustness, Separation & Pitch
Robustness, Separation & Pitch or Morgan, Me & Pitch Dan Ellis Columbia / ICSI dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. Robustness and Separation 2. An Academic Journey 3. Future COLUMBIA
More informationTandem modeling investigations
Tandem modeling investigations Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 What makes Tandem successful? Can we make Tandem better? Does Tandem
More informationSound, Mixtures, and Learning
Sound, Mixtures, and Learning Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/
More informationUsing Source Models in Speech Separation
Using Source Models in Speech Separation Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationLabROSA Research Overview
LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1.
More informationComputational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches
CASA talk - Haskins/NUWC - Dan Ellis 1997oct24/5-1 The importance of auditory illusions for artificial listeners 1 Dan Ellis International Computer Science Institute, Berkeley CA
More informationAutomatic audio analysis for content description & indexing
Automatic audio analysis for content description & indexing Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 Auditory Scene Analysis (ASA) Computational
More informationTandem acoustic modeling: Neural nets for mainstream ASR?
Tandem acoutic modeling: for maintream ASR? Dan Elli International Computer Science Intitute Berkeley CA dpwe@ici.berkeley.edu Outline 2 3 Tandem acoutic modeling Inide Tandem ytem: What going on? Future
More informationComputational Perception /785. Auditory Scene Analysis
Computational Perception 15-485/785 Auditory Scene Analysis A framework for auditory scene analysis Auditory scene analysis involves low and high level cues Low level acoustic cues are often result in
More informationLecture 9: Speech Recognition: Front Ends
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition: Front Ends 1 2 Recognizing Speech Feature Calculation Dan Ellis http://www.ee.columbia.edu/~dpwe/e682/
More informationHCS 7367 Speech Perception
Babies 'cry in mother's tongue' HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Babies' cries imitate their mother tongue as early as three days old German researchers say babies begin to pick up
More informationUsing the Soundtrack to Classify Videos
Using the Soundtrack to Classify Videos Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationAcoustic Signal Processing Based on Deep Neural Networks
Acoustic Signal Processing Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu, Yanhui Tu, Qing Wang, Tian Gao, Jun Du, LiRong Dai Outline
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 BACKGROUND Speech is the most natural form of human communication. Speech has also become an important means of human-machine interaction and the advancement in technology has
More informationModulation and Top-Down Processing in Audition
Modulation and Top-Down Processing in Audition Malcolm Slaney 1,2 and Greg Sell 2 1 Yahoo! Research 2 Stanford CCRMA Outline The Non-Linear Cochlea Correlogram Pitch Modulation and Demodulation Information
More informationSound Analysis Research at LabROSA
Sound Analysis Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationComputational Auditory Scene Analysis: Principles, Practice and Applications
Computational Auditory Scene Analysis: Principles, Practice and Applications Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 Auditory Scene Analysis
More informationAuditory Scene Analysis: phenomena, theories and computational models
Auditory Scene Analysis: phenomena, theories and computational models July 1998 Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 The computational
More informationAdvanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment
DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment SBIR 99.1 TOPIC AF99-1Q3 PHASE I SUMMARY
More informationUsing Speech Models for Separation
Using Speech Models for Separation Dan Ellis Comprising the work of Michael Mandel and Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY
More informationNoise-Robust Speech Recognition Technologies in Mobile Environments
Noise-Robust Speech Recognition echnologies in Mobile Environments Mobile environments are highly influenced by ambient noise, which may cause a significant deterioration of speech recognition performance.
More informationHCS 7367 Speech Perception
Long-term spectrum of speech HCS 7367 Speech Perception Connected speech Absolute threshold Males Dr. Peter Assmann Fall 212 Females Long-term spectrum of speech Vowels Males Females 2) Absolute threshold
More informationCONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS
CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS Mitchel Weintraub and Leonardo Neumeyer SRI International Speech Research and Technology Program Menlo Park, CA, 94025 USA ABSTRACT
More informationA Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar
A Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar Perry C. Hanavan, AuD Augustana University Sioux Falls, SD August 15, 2018 Listening in Noise Cocktail Party Problem
More informationSpeech Enhancement Based on Deep Neural Networks
Speech Enhancement Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu and Jun Du at USTC 1 Outline and Talk Agenda In Signal Processing Letter,
More informationLecture 3: Perception
ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 3: Perception 1. Ear Physiology 2. Auditory Psychophysics 3. Pitch Perception 4. Music Perception Dan Ellis Dept. Electrical Engineering, Columbia University
More informationBinaural Hearing for Robots Introduction to Robot Hearing
Binaural Hearing for Robots Introduction to Robot Hearing 1Radu Horaud Binaural Hearing for Robots 1. Introduction to Robot Hearing 2. Methodological Foundations 3. Sound-Source Localization 4. Machine
More informationAUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening
AUDL GS08/GAV1 Signals, systems, acoustics and the ear Pitch & Binaural listening Review 25 20 15 10 5 0-5 100 1000 10000 25 20 15 10 5 0-5 100 1000 10000 Part I: Auditory frequency selectivity Tuning
More informationAuditory Scene Analysis
1 Auditory Scene Analysis Albert S. Bregman Department of Psychology McGill University 1205 Docteur Penfield Avenue Montreal, QC Canada H3A 1B1 E-mail: bregman@hebb.psych.mcgill.ca To appear in N.J. Smelzer
More informationAuditory principles in speech processing do computers need silicon ears?
* with contributions by V. Hohmann, M. Kleinschmidt, T. Brand, J. Nix, R. Beutelmann, and more members of our medical physics group Prof. Dr. rer.. nat. Dr. med. Birger Kollmeier* Auditory principles in
More informationSound Texture Classification Using Statistics from an Auditory Model
Sound Texture Classification Using Statistics from an Auditory Model Gabriele Carotti-Sha Evan Penn Daniel Villamizar Electrical Engineering Email: gcarotti@stanford.edu Mangement Science & Engineering
More informationSpeech as HCI. HCI Lecture 11. Human Communication by Speech. Speech as HCI(cont. 2) Guest lecture: Speech Interfaces
HCI Lecture 11 Guest lecture: Speech Interfaces Hiroshi Shimodaira Institute for Communicating and Collaborative Systems (ICCS) Centre for Speech Technology Research (CSTR) http://www.cstr.ed.ac.uk Thanks
More informationUSING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES
USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES Varinthira Duangudom and David V Anderson School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332
More informationPrice list Tools for Research & Development
Price list Tools for Research & Development All prior price lists are no longer valid after this price list was issued. For information about the currently valid price list contact HörTech ggmbh. All prices
More informationSPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)
SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) Virendra Chauhan 1, Shobhana Dwivedi 2, Pooja Karale 3, Prof. S.M. Potdar 4 1,2,3B.E. Student 4 Assitant Professor 1,2,3,4Department of Electronics
More informationReview of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04
Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04 Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 SPRACH overview The SPRACH Broadcast
More informationSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
CPC - G10L - 2017.08 G10L SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING processing of speech or voice signals in general (G10L 25/00);
More informationHearing in the Environment
10 Hearing in the Environment Click Chapter to edit 10 Master Hearing title in the style Environment Sound Localization Complex Sounds Auditory Scene Analysis Continuity and Restoration Effects Auditory
More informationHearing the Universal Language: Music and Cochlear Implants
Hearing the Universal Language: Music and Cochlear Implants Professor Hugh McDermott Deputy Director (Research) The Bionics Institute of Australia, Professorial Fellow The University of Melbourne Overview?
More informationLecture 8: Spatial sound
EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound 1 2 3 4 Spatial acoustics Binaural perception Synthesizing spatial audio Extracting spatial sounds Dan Ellis
More informationCombination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency
Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency Tetsuya Shimamura and Fumiya Kato Graduate School of Science and Engineering Saitama University 255 Shimo-Okubo,
More informationSingle-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions
3 Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions Ryoichi Takashima, Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University,
More informationAssistive Technology for Regular Curriculum for Hearing Impaired
Assistive Technology for Regular Curriculum for Hearing Impaired Assistive Listening Devices Assistive listening devices can be utilized by individuals or large groups of people and can typically be accessed
More informationSoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones
SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury, Andrew T. Campbell Dept. of Computer Science, Dartmouth College
More informationRobust Neural Encoding of Speech in Human Auditory Cortex
Robust Neural Encoding of Speech in Human Auditory Cortex Nai Ding, Jonathan Z. Simon Electrical Engineering / Biology University of Maryland, College Park Auditory Processing in Natural Scenes How is
More informationApril 23, Roger Dynamic SoundField & Roger Focus. Can You Decipher This? The challenges of understanding
Roger Dynamic SoundField & Roger Focus Lindsay Roberts, Au.D. Pediatric and School Specialist PA, WV, OH, IN, MI, KY & TN Can You Decipher This? The challenges of understanding Meaningful education requires
More informationAuditory gist perception and attention
Auditory gist perception and attention Sue Harding Speech and Hearing Research Group University of Sheffield POP Perception On Purpose Since the Sheffield POP meeting: Paper: Auditory gist perception:
More informationCarnegie Mellon University Annual Progress Report: 2011 Formula Grant
Carnegie Mellon University Annual Progress Report: 2011 Formula Grant Reporting Period January 1, 2012 June 30, 2012 Formula Grant Overview The Carnegie Mellon University received $943,032 in formula funds
More informationAmbiguity in the recognition of phonetic vowels when using a bone conduction microphone
Acoustics 8 Paris Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone V. Zimpfer a and K. Buck b a ISL, 5 rue du Général Cassagnou BP 734, 6831 Saint Louis, France b
More informationThe role of periodicity in the perception of masked speech with simulated and real cochlear implants
The role of periodicity in the perception of masked speech with simulated and real cochlear implants Kurt Steinmetzger and Stuart Rosen UCL Speech, Hearing and Phonetic Sciences Heidelberg, 09. November
More informationAdaptation of Classification Model for Improving Speech Intelligibility in Noise
1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise) (Regular Paper) 23 4, 2018 7 (JBE Vol. 23, No. 4, July 2018) https://doi.org/10.5909/jbe.2018.23.4.511
More informationRobust Speech Detection for Noisy Environments
Robust Speech Detection for Noisy Environments Óscar Varela, Rubén San-Segundo and Luis A. Hernández ABSTRACT This paper presents a robust voice activity detector (VAD) based on hidden Markov models (HMM)
More informationJitter, Shimmer, and Noise in Pathological Voice Quality Perception
ISCA Archive VOQUAL'03, Geneva, August 27-29, 2003 Jitter, Shimmer, and Noise in Pathological Voice Quality Perception Jody Kreiman and Bruce R. Gerratt Division of Head and Neck Surgery, School of Medicine
More informationRoger TM. Learning without limits. SoundField for education
Roger TM Learning without limits SoundField for education Hearing more means learning more Classrooms are noisy places. Lively discussions and interactions between teachers and students are vital for effective
More informationTowards Accessible Conversations in a Mobile Context for People who are Deaf or Hard of Hearing
Towards Accessible Conversations in a Mobile Context for People who are Deaf or Hard of Hearing Dhruv Jain, Rachel Franz, Leah Findlater, Jackson Cannon, Raja Kushalnagar, and Jon Froehlich University
More informationNoise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise
4 Special Issue Speech-Based Interfaces in Vehicles Research Report Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise Hiroyuki Hoshino Abstract This
More informationThe effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet
The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet Ghazaleh Vaziri Christian Giguère Hilmi R. Dajani Nicolas Ellaham Annual National Hearing
More informationOregon Graduate Institute of Science and Technology,
SPEAKER RECOGNITION AT OREGON GRADUATE INSTITUTE June & 6, 997 Sarel van Vuuren and Narendranath Malayath Hynek Hermansky and Pieter Vermeulen, Oregon Graduate Institute, Portland, Oregon Oregon Graduate
More informationAccessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)
Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH) Matt Huenerfauth Raja Kushalnagar Rochester Institute of Technology DHH Auditory Issues Links Accents/Intonation Listening
More informationTOLERABLE DELAY FOR SPEECH PROCESSING: EFFECTS OF HEARING ABILITY AND ACCLIMATISATION
TOLERABLE DELAY FOR SPEECH PROCESSING: EFFECTS OF HEARING ABILITY AND ACCLIMATISATION Tobias Goehring, PhD Previous affiliation (this project): Institute of Sound and Vibration Research University of Southampton
More informationConsonant Perception test
Consonant Perception test Introduction The Vowel-Consonant-Vowel (VCV) test is used in clinics to evaluate how well a listener can recognize consonants under different conditions (e.g. with and without
More informationSpeech and Sound Use in a Remote Monitoring System for Health Care
Speech and Sound Use in a Remote System for Health Care M. Vacher J.-F. Serignat S. Chaillol D. Istrate V. Popescu CLIPS-IMAG, Team GEOD Joseph Fourier University of Grenoble - CNRS (France) Text, Speech
More informationCodebook driven short-term predictor parameter estimation for speech enhancement
Codebook driven short-term predictor parameter estimation for speech enhancement Citation for published version (APA): Srinivasan, S., Samuelsson, J., & Kleijn, W. B. (2006). Codebook driven short-term
More informationSource and Description Category of Practice Level of CI User How to Use Additional Information. Intermediate- Advanced. Beginner- Advanced
Source and Description Category of Practice Level of CI User How to Use Additional Information Randall s ESL Lab: http://www.esllab.com/ Provide practice in listening and comprehending dialogue. Comprehension
More informationLecture 4: Auditory Perception. Why study perception?
EE E682: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1 2 3 4 5 6 Motivation: Why & how Auditory physiology Psychophysics: Detection & discrimination Pitch perception Speech perception
More informationOver-representation of speech in older adults originates from early response in higher order auditory cortex
Over-representation of speech in older adults originates from early response in higher order auditory cortex Christian Brodbeck, Alessandro Presacco, Samira Anderson & Jonathan Z. Simon Overview 2 Puzzle
More informationFrequency Tracking: LMS and RLS Applied to Speech Formant Estimation
Aldebaro Klautau - http://speech.ucsd.edu/aldebaro - 2/3/. Page. Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation ) Introduction Several speech processing algorithms assume the signal
More informationHearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds
Hearing Lectures Acoustics of Speech and Hearing Week 2-10 Hearing 3: Auditory Filtering 1. Loudness of sinusoids mainly (see Web tutorial for more) 2. Pitch of sinusoids mainly (see Web tutorial for more)
More informationBroadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop
Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop Lin-Ching Chang Department of Electrical Engineering and Computer Science School of Engineering Work Experience 09/12-present,
More informationGender Based Emotion Recognition using Speech Signals: A Review
50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department
More informationAssistive Listening Technology: in the workplace and on campus
Assistive Listening Technology: in the workplace and on campus Jeremy Brassington Tuesday, 11 July 2017 Why is it hard to hear in noisy rooms? Distance from sound source Background noise continuous and
More informationWhat you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for
What you re in for Speech processing schemes for cochlear implants Stuart Rosen Professor of Speech and Hearing Science Speech, Hearing and Phonetic Sciences Division of Psychology & Language Sciences
More informationSUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf
2th European Signal Processing Conference (EUSIPCO 212) Bucharest, Romania, August 27-31, 212 SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS Christina Leitner and Franz Pernkopf
More informationI. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST 2010 1379 Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments Hynek Bořil,
More informationAuditory Scene Analysis in Humans and Machines
Auditory Scene Analysis in Humans and Machines Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationEnhanced Feature Extraction for Speech Detection in Media Audio
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Enhanced Feature Extraction for Speech Detection in Media Audio Inseon Jang 1, ChungHyun Ahn 1, Jeongil Seo 1, Younseon Jang 2 1 Media Research Division,
More informationProviding Effective Communication Access
Providing Effective Communication Access 2 nd International Hearing Loop Conference June 19 th, 2011 Matthew H. Bakke, Ph.D., CCC A Gallaudet University Outline of the Presentation Factors Affecting Communication
More informationAcoustic Sensing With Artificial Intelligence
Acoustic Sensing With Artificial Intelligence Bowon Lee Department of Electronic Engineering Inha University Incheon, South Korea bowon.lee@inha.ac.kr bowon.lee@ieee.org NVIDIA Deep Learning Day Seoul,
More informationFAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION
FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION Arne Leijon and Svante Stadler Sound and Image Processing Lab., School of Electrical Engineering,
More informationOutline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete
Teager Energy and Modulation Features for Speech Applications Alexandros Summariza(on Potamianos and Emo(on Tracking in Movies Dept. of ECE Technical Univ. of Crete Alexandros Potamianos, NatIONAL Tech.
More informationEffects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking Vahid Montazeri, Shaikat Hossain, Peter F. Assmann University of Texas
More informationVoice Detection using Speech Energy Maximization and Silence Feature Normalization
, pp.25-29 http://dx.doi.org/10.14257/astl.2014.49.06 Voice Detection using Speech Energy Maximization and Silence Feature Normalization In-Sung Han 1 and Chan-Shik Ahn 2 1 Dept. of The 2nd R&D Institute,
More information11 Music and Speech Perception
11 Music and Speech Perception Properties of sound Sound has three basic dimensions: Frequency (pitch) Intensity (loudness) Time (length) Properties of sound The frequency of a sound wave, measured in
More informationAvailable online at ScienceDirect. Procedia Manufacturing 3 (2015 )
Available online at www.sciencedirect.com ScienceDirect Procedia Manufacturing 3 (2015 ) 5512 5518 6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the Affiliated Conferences,
More informationImproving the Noise-Robustness of Mel-Frequency Cepstral Coefficients for Speech Processing
ISCA Archive http://www.isca-speech.org/archive ITRW on Statistical and Perceptual Audio Processing Pittsburgh, PA, USA September 16, 2006 Improving the Noise-Robustness of Mel-Frequency Cepstral Coefficients
More informationReach and grasp by people with tetraplegia using a neurally controlled robotic arm
Leigh R. Hochberg et al. Reach and grasp by people with tetraplegia using a neurally controlled robotic arm Nature, 17 May 2012 Paper overview Ilya Kuzovkin 11 April 2014, Tartu etc How it works?
More informationAn Auditory System Modeling in Sound Source Localization
An Auditory System Modeling in Sound Source Localization Yul Young Park The University of Texas at Austin EE381K Multidimensional Signal Processing May 18, 2005 Abstract Sound localization of the auditory
More informationSound Interfaces Engineering Interaction Technologies. Prof. Stefanie Mueller HCI Engineering Group
Sound Interfaces 6.810 Engineering Interaction Technologies Prof. Stefanie Mueller HCI Engineering Group what is sound? if a tree falls in the forest and nobody is there does it make sound?
More informationHow to use mycontrol App 2.0. Rebecca Herbig, AuD
Rebecca Herbig, AuD Introduction The mycontrol TM App provides the wearer with a convenient way to control their Bluetooth hearing aids as well as to monitor their hearing performance closely. It is compatible
More informationAnalysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System
Analysis of Recognition Techniques for use in a Sound Recognition System Michael Cowling, Member, IEEE and Renate Sitte, Member, IEEE Griffith University Faculty of Engineering & Information Technology
More informationDate: April 19, 2017 Name of Product: Cisco Spark Board Contact for more information:
Date: April 19, 2017 Name of Product: Cisco Spark Board Contact for more information: accessibility@cisco.com Summary Table - Voluntary Product Accessibility Template Criteria Supporting Features Remarks
More informationThe SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation Martin Graciarena 1, Luciana Ferrer 2, Vikramjit Mitra 1 1 SRI International,
More informationChallenges in microphone array processing for hearing aids. Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany
Challenges in microphone array processing for hearing aids Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany SIEMENS Audiological Engineering Group R&D Signal Processing and Audiology
More informationYour hearing. Your solution.
Your hearing. Your solution. Name Date Your audiogram Loud Soft db 10 0 10 20 30 40 50 60 70 80 90 100 110 120 Low pitched High pitched 125 250 500 1000 2000 4000 8000 Hz Normal hearing Mild hearing loss
More informationSpeech (Sound) Processing
7 Speech (Sound) Processing Acoustic Human communication is achieved when thought is transformed through language into speech. The sounds of speech are initiated by activity in the central nervous system,
More information