Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop

Similar documents
CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

Speech recognition in noisy environments: A survey

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Noise-Robust Speech Recognition Technologies in Mobile Environments

General Soundtrack Analysis

Methods for Improving Readability of Speech Recognition Transcripts. John McCoey. Abstract

Rina Patel, Brent Greenberg, Steven Montner, Alexandra Funaki, Christopher Straus, Steven Zangan, and Heber MacMahon

Interact-AS. Use handwriting, typing and/or speech input. The most recently spoken phrase is shown in the top box

SpeechZone 2. Author Tina Howard, Au.D., CCC-A, FAAA Senior Validation Specialist Unitron Favorite sound: wind chimes

CHAPTER 1 INTRODUCTION

A Smart Texting System For Android Mobile Users

A Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar

Acoustic Signal Processing Based on Deep Neural Networks

Assistive Listening Technology: in the workplace and on campus

Director of Testing and Disability Services Phone: (706) Fax: (706) E Mail:

Speech to Text Wireless Converter

Lecture 9: Speech Recognition: Front Ends

Captioning Your Video Using YouTube Online Accessibility Series

Speech as HCI. HCI Lecture 11. Human Communication by Speech. Speech as HCI(cont. 2) Guest lecture: Speech Interfaces

Robust Speech Detection for Noisy Environments

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

Telephone Based Automatic Voice Pathology Assessment.

Assistive Technologies

Real Time Sign Language Processing System

THE LISTENING QUESTIONNAIRE TLQ For Parents and Teachers of Students Ages 7 through 17 Years

HOW AI WILL IMPACT SUBTITLE PRODUCTION

Open up to the world. A new paradigm in hearing care

Appendix C Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

Discover the Accessibility Features of Smartphones! A Wireless Education Seminar for Consumers who are Deaf and Hard-of-Hearing

Lecturer: T. J. Hazen. Handling variability in acoustic conditions. Computing and applying confidence scores

Research Proposal on Emotion Recognition

TRANSCRIBING AND CODING P.A.R.T. SESSION SESSION1: TRANSCRIBING

User Guide V: 3.0, August 2017

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE) Vol 5, Issue 3, March 2018 Gesture Glove

TODAY AND THE DEVELOPMENT

Using Source Models in Speech Separation

Sennheiser. ActiveGard Technology. Your investment in. Sound Safety WHITE PAPER

C H A N N E L S A N D B A N D S A C T I V E N O I S E C O N T R O L 2

Full Utilization of Closed-captions in Broadcast News Recognition

The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments

Virtual Sensors: Transforming the Way We Think About Accommodation Stevens Institute of Technology-Hoboken, New Jersey Katherine Grace August, Avi

Children and hearing. General information on children s hearing and hearing loss.

A new era in classroom amplification

Roger at work. Bridging the understanding gap

The Benefits and Challenges of Amplification in Classrooms.

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)

CSE 118/218 Final Presentation. Team 2 Dreams and Aspirations

INTELLIGENT LIP READING SYSTEM FOR HEARING AND VOCAL IMPAIRMENT

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Gender Based Emotion Recognition using Speech Signals: A Review

Appendix C: Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses

Sound Interfaces Engineering Interaction Technologies. Prof. Stefanie Mueller HCI Engineering Group

Making Sure People with Communication Disabilities Get the Message

Date: April 19, 2017 Name of Product: Cisco Spark Board Contact for more information:

On-The-Fly Student Notes from Video Lecture. Using ASR

easy read Your rights under THE accessible InformatioN STandard

A Sleeping Monitor for Snoring Detection

Speech and Sound Use in a Remote Monitoring System for Health Care

DRAFT. 7 Steps to Better Communication. When a loved one has hearing loss. How does hearing loss affect communication?

COMPUTER PLAY IN EDUCATIONAL THERAPY FOR CHILDREN WITH STUTTERING PROBLEM: HARDWARE SETUP AND INTERVENTION

SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING

Speed - Accuracy - Exploration. Pathfinder SL

Online Speaker Adaptation of an Acoustic Model using Face Recognition

TOPICS IN AMPLIFICATION

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

how we hear. Better understanding of hearing loss The diagram above illustrates the steps involved.

Speaker Independent Isolated Word Speech to Text Conversion Using Auto Spectral Subtraction for Punjabi Language

An Examination of Speech In Noise and its Effect on Understandability for Natural and Synthetic Speech. Brian Langner and Alan W Black CMU-LTI

IMPROVING THE PATIENT EXPERIENCE IN NOISE: FAST-ACTING SINGLE-MICROPHONE NOISE REDUCTION

Fig. 1 High level block diagram of the binary mask algorithm.[1]

Putting the focus on conversations

VIRTUAL ASSISTANT FOR DEAF AND DUMB

Communications Accessibility with Avaya IP Office

ENZO 3D First fitting with ReSound Smart Fit 1.1

Avaya Model 9611G H.323 Deskphone

icommunicator, Leading Speech-to-Text-To-Sign Language Software System, Announces Version 5.0

easy read Your rights under THE accessible InformatioN STandard

Production of Stop Consonants by Children with Cochlear Implants & Children with Normal Hearing. Danielle Revai University of Wisconsin - Madison

Cued Speech and Cochlear Implants: Powerful Partners. Jane Smith Communication Specialist Montgomery County Public Schools

Your Guide to Hearing

Overview 6/27/16. Rationale for Real-time Text in the Classroom. What is Real-Time Text?

Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phase Components

VITHEA: On-line word naming therapy in Portuguese for aphasic patients exploiting automatic speech recognition

HearPhones. hearing enhancement solution for modern life. As simple as wearing glasses. Revision 3.1 October 2015

Voluntary Product Accessibility Template (VPAT)

A PUBLIC DOMAIN SPEECH-TO-TEXT SYSTEM

The Perfect Dictator

Recognition & Organization of Speech & Audio

Speech Enhancement Based on Deep Neural Networks

Robust Neural Encoding of Speech in Human Auditory Cortex

Summary Table Voluntary Product Accessibility Template. Supports. Please refer to. Supports. Please refer to

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information

This is a guide for volunteers in UTS HELPS Buddy Program. UTS.EDU.AU/CURRENT-STUDENTS/SUPPORT/HELPS/

Adaptation of Classification Model for Improving Speech Intelligibility in Noise

Houghton Mifflin Harcourt Avancemos!, Level correlated to the

Communication. Jess Walsh

ITU-T. FG AVA TR Version 1.0 (10/2013) Part 3: Using audiovisual media A taxonomy of participation

Note: This document describes normal operational functionality. It does not include maintenance and troubleshooting procedures.

Transcription:

Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop Lin-Ching Chang Department of Electrical Engineering and Computer Science School of Engineering

Work Experience 09/12-present, Associate Professor, EECS, CUA 09/07-08/12, Assistant Professor, EECS, CUA 09/03-08/07, IRTA Postdoctoral Fellow, NIH 03/03-08/03, Senior Software Programmer and Medical Image Analyst, NIH 03/99-02/03, Senior Software Engineer, 3Com Corporation 2

Research Experience Overview Pattern recognition Image processing Big-data analysis Medical informatics Parallel processing Telecommunication Medical Image Processing and Analysis Diffusion Tensor MRI Spectral Image Stack Decision map Generate raw images Source Images ICA Unmix Compute XCNR & Decision maps ICA Results Estimate Noise Denoised Images Noise standard deviations ROI Masks Microscopic Image Processing & Analysis Two-Photo Microscopy Imaging GPU Hardware Acceleration Solar Image Processing & Analysis Coronal Mass Ejections 3

Adapted HMM for Robust Speech Recognition

The Benefits of Effective Speech Recognition Benefits can vary based on industries Work processes become more efficient Save a great deal of labor Save a great deal of time Hand free computing - voice dictations from digital dictation devices Speech recognition is fun - nothing is more fascinating than the quick transformation of spoken words into readable text. However, Speech recognition has the chance to cause increased frustration for the users/customers 5

LVCSR Large Vocabulary Continuous Speech Recognition (LVCSR) ~20,000-64,000 words Speaker independent (vs. speaker-dependent) Continuous speech (vs isolated-word) 6

Word error rates Ballpark numbers; exact numbers depend very much on the specific corpus Task Vocabulary Error Rate% Digits 11 0.5 WSJ read speech (clean) ~5000+ 3 WSJ read speech (clean) ~20,000+ 3 Broadcast news ~64,000+ 10 Conversational Telephone ~64,000+ 20 *WSJ: Wall Street Journal 7

HSR versus ASR Task Vocab ASR Hum SR Continuous digits 11.5.009 WSJ clean 5K 3 0.9 WSJ w/noise 5K 9 1.1 SWBD 65K 20 4 Conclusions: Machines are about 5 times worse than humans Gap increases with noisy speech These numbers are rough, take with grain of salt Error Rate (%) *SWBD: Switchboard database human-to-human telephone conversations 8

ASR Today http://voice-recognition-software-review.toptenreviews.com/ 9

Accuracy ranged 60%~95% 10

Challenges in the Design of a SR System SR systems have to deal with a large number of challenges The speaker s voice is often accompanied by surrounding noise which makes their accurate recognition difficult. A speaker may speak a number of different words and all of these words have to be accurately recognized. Accent of speaking varies from person to person and this is a very big challenge A speaker may speak something very quickly and all of the words spoken have to be individually recognized accurately. 11

Types of SR Systems Speaker Dependent SR systems Work by learning the unique characteristics of a single person s voice and depend on the speaker for training. Speaker Independent SR systems Designed to recognize anyone s voice, so no training is involved. 12

SIRI and GOOGLE NOW Intelligent Personal Assistant developed by Apple. Google Now is an intelligent personal assistant developed by Google. Both use a combination of speaker- dependent and speaker- independent speech recognition systems 13

Applications Health Care - Medical documentation - Therapeutic use In-car Systems Military - High performance aircrafts - Air traffic control systems Telephony - Smart-phones - Customer Helpline Services Usage in Education People with Disabilities Daily Life 14

Speech Recognition for Healthcare Speech recognition drives efficiencies and cost savings in clinical documentation by turning clinician dictations into formatted documents -- automatically. Front-end speech recognition allows clinicians to dictate, self-edit and sign transcription-free, completed reports in one sitting directly into a PACS system or EHR. Background speech recognition clinician dictation into speech-recognized first drafts that medical language specialists (MLS) edit it later. 15

Speech Recognition for Healthcare Benefits Reduce document turnaround times Save on transcription costs - significantly Enhance patient care through increased clinical record accuracy, inclusiveness and access Dictate directly into the EHR with front-end speech recognition Accelerate EHR navigation within the EHR, saving physicians time Increase clinician satisfaction and EHR adoption Employ multiple dictation options including phone, dictation devices, and workstations Several studies shows speech recognition leads to imaging report errors Basma S1, Lord B, Jacks LM, Rizk M, Scaranelo AM., Error rates in breast imaging reports: comparison of automatic speech recognition and dictation transcription. AJR Am J Roentgenol. 2011 Oct;197(4):923-7. 16

Common Error Types Word omission Word substitution Nonsense phrase Wrong word Punctuation error Incorrect measurement (mm/cm) Missing or added no Added word Verb tense Plural Spelling mistake Incomplete phrase Conclusion of their study Complex breast imaging reports generated with ASR were associated with higher error rates (3~8 times higher) than reports generated with conventional dictation transcription. Basma S1, Lord B, Jacks LM, Rizk M, Scaranelo AM., Error rates in breast imaging reports: comparison of automatic speech recognition and dictation transcription. AJR Am J Roentgenol. 2011 Oct;197(4):923-7. 17

Hidden Markov Model (HMM) Markov models are excellent ways of abstracting simple concepts into a relatively easily computable form. Used in data compression to sound recognition. From this graph we can create sequences such as: N1 N2 N3 N1 N2 N2 N2 N3 N3 N3 N3 N3 N1 N1 N2 N2 N3 18

Hidden Markov Model (HMM) N1 N2 N3 = 0.4 X 0.8 X 0.5 = 0.16 N1 N2 N2 N2 N3 N3 N3 N3 N3 = 0.4 x 0.2 x 0.2 x 0.8 x 0.5 x 0.5 x 0.5 x 0.5 = 0.0008 N1 N1 N2 N2 N3 = 0.6 x 0.4 x 0.2 x 0.8 x 0.5 = 0.192 19

Hidden Markov Model (HMM) There are approximately 44 phonemes in English. Phoneme example: tomato This accommodates for pronunciations such as: t ow m aa t ow - British English t ah m ey t ow - American English t ah mey t a - Possibly pronunciation when speaking quickly 20

Hidden Markov Model (HMM) Language model example: With sentences such as: I like apple juice - Very probable I like tomato juice - Very improbable! I hate apple juice - Relatively improbable I hate tomato juice - Relatively probable 21

Robust Speech Recognition The study of building speech recognition that handle mismatch condition. Mismatch condition? The difference between training and operating (testing) environment. It exists. For example, Simpler example: sudden door slam when dictating a letter. In wireless environment, the background of the speaker can change. 22

Mismatch Conditions Why mismatch conditions are hard to deal with? There are so many causes of it. Additive noise (e.g. background noise such as air-conditioning) Channel noise (e.g. difference between microphones in training and testing conditions) Others : Lombard noise. Reflection of building. In general, noise can have Random amplitude, Random duration, Random occurrence, Random spectral characteristic. 23

Previous Works Parallel Model Combination (PMC) (Gales 1995) First collect some samples of noise in operating environment, Update acoustic model using the noise statistics, Work satisfactorily for stationary noise, General time-varying noise cannot be handled. Dealing with Short Time Noise (Chan 2002) HMM-based Skip poor frames Modified Viterbi Algorithm dealing with Impulsive Noise (Siu 2005) Joint decoding and detection during the Viterbi search Lost frames are replaced by interpolated neighboring frames 24

Proposed Work HMM-based approach Finding a state sequence with best robust likelihood Conventional approach: For every state sequence, consider all possible patterns of corruption of K frames among T frames. Our approach: incorporate some prior information to find possible K Replace dynamic programming approach to branch-and-bound approach Developing outlier detection algorithms Leverage my research experience in outlier detection in medical images Define the characteristics of outliers in a wireless environment Classification or ICA to separate the speaking with noise/outliers Skipping frames or replacing frames? Different strategies should be used to deal with different types of noise/outliers (mismatch conditions) 25

CONCLUSION Speech Recognition systems are an indispensable part of the ever-advancing field of human-computer interaction. Needs greater research to tackle various challenges. 26

Thank You! Questions? 27