Acoustic Sensing With Artificial Intelligence

Similar documents
Sound Waves. Sensation and Perception. Sound Waves. Sound Waves. Sound Waves

College of Medicine Dept. of Medical physics Physics of ear and hearing /CH

Binaural Hearing for Robots Introduction to Robot Hearing

Hearing. istockphoto/thinkstock

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

Chapter 13 Physics of the Ear and Hearing

A Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar

PSY 214 Lecture # (11/9/2011) (Sound, Auditory & Speech Perception) Dr. Achtman PSY 214

Challenges in microphone array processing for hearing aids. Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

Auditory Physiology PSY 310 Greg Francis. Lecture 29. Hearing

PSY 310: Sensory and Perceptual Processes 1

COM3502/4502/6502 SPEECH PROCESSING

The Outer and Middle Ear PERIPHERAL AUDITORY SYSTEM HOW WE HEAR. The Ear in Action AUDITORY NEUROPATHY: A CLOSER LOOK. The 3 parts of the ear

How Do Our Ears Work? Quiz

Receptors / physiology

BCS 221: Auditory Perception BCS 521 & PSY 221

Computational Perception /785. Auditory Scene Analysis

Deafness and hearing impairment

Intro to Audition & Hearing

ENT 318 Artificial Organs Physiology of Ear

Human Acoustic Processing

Hearing Sound. The Human Auditory System. The Outer Ear. Music 170: The Ear

Music 170: The Ear. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) November 17, 2016

Senses Other Than Vision. Hearing (Audition) Transmission of Vibrations

Measurements that simulate human hearing psychoacoustics in ITU-T P.863

PSY 214 Lecture 16 (11/09/2011) (Sound, auditory system & pitch perception) Dr. Achtman PSY 214

Discrete Signal Processing

Auditory System. Barb Rohrer (SEI )

Binaural processing of complex stimuli

Required Slide. Session Objectives

ID# Final Exam PS325, Fall 1997

Chapter 3. Sounds, Signals, and Studio Acoustics

Sound. Audition. Physics of Sound. Properties of sound. Perception of sound works the same way as light.

Audition. Sound. Physics of Sound. Perception of sound works the same way as light.

Learning Targets. Module 20. Hearing Explain how the ear transforms sound energy into neural messages.

SOLUTIONS Homework #3. Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03

Implementation of Spectral Maxima Sound processing for cochlear. implants by using Bark scale Frequency band partition

Chapter 11: Sound, The Auditory System, and Pitch Perception

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

PSY 215 Lecture 10 Topic: Hearing Chapter 7, pages

Lecture 3: Perception

Before we talk about the auditory system we will talk about the sound and waves

HEAR YE! HEAR YE! (1.5 Hours)

The Ear. The ear can be divided into three major parts: the outer ear, the middle ear and the inner ear.

HEARING AND PSYCHOACOUSTICS

Assistive Technology Project. Presented By: Rose Aldan

Hearing. By: Jimmy, Dana, and Karissa

An Auditory-Model-Based Electrical Stimulation Strategy Incorporating Tonal Information for Cochlear Implant

The speed at which it travels is a function of the density of the conducting medium.

J Jeffress model, 3, 66ff

MECHANISM OF HEARING

Hearing. By Jack & Tori

Hearing. PSYCHOLOGY (8th Edition, in Modules) David Myers. Module 14. Hearing. Hearing

Bioscience in the 21st century

SPHSC 462 HEARING DEVELOPMENT. Overview Review of Hearing Science Introduction

Running head: HEARING-AIDS INDUCE PLASTICITY IN THE AUDITORY SYSTEM 1

Ganglion Cells Blind Spot Cornea Pupil Visual Area of the Bipolar Cells Thalamus Rods and Cones Lens Visual cortex of the occipital lobe

Hearing: Physiology and Psychoacoustics

Systems for Improvement of the Communication in Passenger Compartments

Can You Hear Me Now?

An Auditory System Modeling in Sound Source Localization

Auditory Physiology PSY 310 Greg Francis. Lecture 30. Organ of Corti

HEARING. Structure and Function

EMANATIONS FROM RESIDUUM OSCILLATIONS IN HUMAN AUDITORY SYSTEM

Multimedia Systems 2011/2012

Presentation On SENSATION. Prof- Mrs.Kuldeep Kaur

THE MECHANICS OF HEARING

ID# Exam 2 PS 325, Fall 2003

Hearing. and other senses

SUBJECT: Physics TEACHER: Mr. S. Campbell DATE: 15/1/2017 GRADE: DURATION: 1 wk GENERAL TOPIC: The Physics Of Hearing

Anatomy and Physiology of Hearing

TEAK Bioengineering Artificial Hearing Lesson Plan Page 1 TEAK Traveling Engineering Activity Kits

SENSATION & PERCEPTION

Sonic Spotlight. Binaural Coordination: Making the Connection

THE EAR AND HEARING Be sure you have read and understand Chapter 16 before beginning this lab. INTRODUCTION: hair cells outer ear tympanic membrane

Perception of Sound. To hear sound, your ear has to do three basic things:

Chapter 1: Introduction to digital audio

(Thomas Lenarz) Ok, thank you, thank you very much for inviting me to be here and speak to you, on cochlear implant technology.

Robust Neural Encoding of Speech in Human Auditory Cortex

Systems Neuroscience Oct. 16, Auditory system. http:

HCS 7367 Speech Perception

how we hear. Better understanding of hearing loss The diagram above illustrates the steps involved.

Auditory Physiology Richard M. Costanzo, Ph.D.

INTRODUCTION TO AUDIOLOGY Hearing Balance Tinnitus - Treatment

Adaptation of Classification Model for Improving Speech Intelligibility in Noise

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

Digital. hearing instruments have burst on the

Unit VIII Problem 9 Physiology: Hearing

Speaker s Notes: AB is dedicated to helping people with hearing loss hear their best. Partnering with Phonak has allowed AB to offer unique

Glossary For Parents. Atresia: closure of the ear canal or absence of an ear opening.

SENSORY SYSTEM VII THE EAR PART 1

In-Ear Microphone Equalization Exploiting an Active Noise Control. Abstract

Speech Generation and Perception

INTRODUCTION TO HEARING DEVICES

L2: Speech production and perception Anatomy of the speech organs Models of speech production Anatomy of the ear Auditory psychophysics

Outline. The ear and perception of sound (Psychoacoustics) A.1 Outer Ear Amplifies Sound. Introduction

For this lab you will use parts of Exercise #18 in your Wise lab manual. Please be sure to read those sections before coming to lab

Copyright 2009 Pearson Education, Inc.

Hearing Loss. How does the hearing sense work? Test your hearing

Transcription:

Acoustic Sensing With Artificial Intelligence Bowon Lee Department of Electronic Engineering Inha University Incheon, South Korea bowon.lee@inha.ac.kr bowon.lee@ieee.org NVIDIA Deep Learning Day Seoul, South Korea October 31, 2017 Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 1 / 36

Outline 1 Introduction 2 Acoustic Sensing With a Microphone Array 3 Sound Source Localization 4 Human Hearing 5 Blind Source Separation 6 Concluding Remarks Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 2 / 36

Telephone (1876) Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 3 / 36

Home Assistants (Current) Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 4 / 36

AVICAR Project at UIUC (2002 2006) AVICAR Video Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 5 / 36

ConnectUs Project at HP Labs (2008 2011) ConnectUs Video Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 6 / 36

Outline 1 Introduction 2 Acoustic Sensing With a Microphone Array 3 Sound Source Localization 4 Human Hearing 5 Blind Source Separation 6 Concluding Remarks Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 7 / 36

Acoustic Sensing With a Microphone Array Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 8 / 36

Acoustic Sensing With a Microphone Array Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 8 / 36

Acoustic Sensing With a Microphone Array Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 8 / 36

Acoustic Sensing With a Microphone Array Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 8 / 36

Acoustic Sensing With a Microphone Array Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 8 / 36

Audio Communication Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 9 / 36

Signal Processing for Audio Communication Output signal of the capture engine ŷ n (t) = M {f mn y m }(t) m=1 }{{} microphone array processing Topics of Acoustic Signal Processing Beamforming Blind Source Separation Sound Source Localization Multichannel Echo Cancellation + L {q ln x l }(t) l=1 } {{ } echo cancellation Dereverberation, Noise Suppression, Speaker Diarization Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 10 / 36

Signal Processing for Audio Communication Output signal of the capture engine ŷ n (t) = M {f mn y m }(t) m=1 }{{} microphone array processing Topics of Acoustic Signal Processing Beamforming Blind Source Separation Sound Source Localization Multichannel Echo Cancellation + L {q ln x l }(t) l=1 } {{ } echo cancellation Dereverberation, Noise Suppression, Speaker Diarization Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 10 / 36

Signal Processing for Audio Communication Output signal of the capture engine ŷ n (t) = M {f mn y m }(t) m=1 }{{} microphone array processing Topics of Acoustic Signal Processing Beamforming Blind Source Separation Sound Source Localization Multichannel Echo Cancellation + L {q ln x l }(t) l=1 } {{ } echo cancellation Dereverberation, Noise Suppression, Speaker Diarization Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 10 / 36

Outline 1 Introduction 2 Acoustic Sensing With a Microphone Array 3 Sound Source Localization 4 Human Hearing 5 Blind Source Separation 6 Concluding Remarks Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 11 / 36

Sound Sources in an Acoustic Scene Sound source of interest Noise source Object Microphones 0 1 2 M-1 Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 12 / 36

Time-Delay Estimation Example: Signals at two microphones 0.3 0.2 0.1 Amplitude 0 0.1 0.2 0 50 100 150 200 Samples Mic 1 Mic 2 Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 13 / 36

Signal Model: Two Microphones Signals in the discrete time domain { x1 [n] = s[n]+v 1 [n] x 2 [n] = s[n ]+v 2 [n] x 1 [n], x 2 [n]: Captured signal at each microphone s[n]: Source signal = f s τ: Time difference of arrival (TDOA) in samples v 1 [n], v 2 [n]: Noise and reverberation at each microphone Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 14 / 36

Signal Model: Two Microphones Signals in the discrete time domain { x1 [n] = s[n]+v 1 [n] x 2 [n] = s[n ]+v 2 [n] x 1 [n], x 2 [n]: Captured signal at each microphone s[n]: Source signal = f s τ: Time difference of arrival (TDOA) in samples v 1 [n], v 2 [n]: Noise and reverberation at each microphone Time-Delay Estimation A problem to find given x 1 [n] and x 2 [n] Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 14 / 36

Generalized Cross Correlation Method for TDE Signals in the frequency domain { X1 (ω) = S(ω)+V 1 (ω) X 2 (ω) = S(ω)e jω + V 2 (ω) Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 15 / 36

Generalized Cross Correlation Method for TDE Signals in the frequency domain { X1 (ω) = S(ω)+V 1 (ω) X 2 (ω) = S(ω)e jω + V 2 (ω) Generalized Cross Correlation (GCC) Method ˆ = arg max ψ(ω)x 1 (ω)x2 (ω)ejω dω ω Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 15 / 36

Generalized Cross Correlation Method for TDE Signals in the frequency domain { X1 (ω) = S(ω)+V 1 (ω) X 2 (ω) = S(ω)e jω + V 2 (ω) Generalized Cross Correlation (GCC) Method ˆ = arg max ψ(ω)x 1 (ω)x2 (ω)ejω dω ω Phase Transform (PHAT) ψ PHAT (ω) = 1/ X 1 (ω)x2 (ω) X 1 (ω)x2 ˆ = arg max (ω) X 1 (ω)x2 dω (ω) ejω ω Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 15 / 36

Sound Source Localization With a Microphone Array Multi-microphone extension of the GCC method M M ˆq = arg max ψ(ω)x m (ω)xk (ω)ejω q mk dω, q Q m=1 k=1 ω Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 16 / 36

Sound Source Localization With a Microphone Array Multi-microphone extension of the GCC method M M ˆq = arg max ψ(ω)x m (ω)xk (ω)ejω q mk dω, q Q m=1 k=1 ω where q mk = q k q m and Q denotes the search space of all potential source locations. Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 16 / 36

Sound Source Localization With a Microphone Array Multi-microphone extension of the GCC method M M ˆq = arg max ψ(ω)x m (ω)xk (ω)ejω q mk dω, q Q m=1 k=1 ω where q mk = q k q m and Q denotes the search space of all potential source locations. With the PHAT weighting, it can be simplified as M X m (ω) q ˆq = arg max q Q ω X m (ω) ejω m m=1 2 dω, Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 16 / 36

Steered Response Power With Phase-Transform SRP-PHAT Power Plot Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 17 / 36

SRP-PHAT Computation on GPU SingleCPU thread...... Block 1......... Block 2...... Block n... Serial scan CPU!based common approach Parallel scan Our GPU!based approach Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 18 / 36

Simultaneous Speaker Detection and Localization Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 19 / 36

System Setup RGB Camera Microphone Array Camera s Field of View Eight microphones and one RGB camera Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 20 / 36

Multimodal Speaker Detection and Localization Use the HMM competition scheme Run it for each one of the detected faces The SRP-PHAT plot has many local maxima for multiple simultaneous speaker cases Mid-Fusion of the parameters before HMM competition 1 Mid-Fusion Algorithm for Speaker Detection and Localization 1 V. P.Minotto, C. R. Jung, and B. Lee, Simultaneous-Speaker Voice Activity Detection and Localization Using Mid-Fusion of SVM and HMMs, IEEE TMM 2014 Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 21 / 36

Sound Source Localization Note that: Sound source localization methods work reasonably well For long (200 ms or longer) signals, With relatively low reverberation and background noise, or With the help of multimodal sensor fusion Otherwise, they usually fail. Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 22 / 36

Sound Source Localization Note that: Sound source localization methods work reasonably well For long (200 ms or longer) signals, With relatively low reverberation and background noise, or With the help of multimodal sensor fusion Otherwise, they usually fail. It is generally believed that: Humans do much better than any algorithmic methods. Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 22 / 36

Sound Source Localization Note that: Sound source localization methods work reasonably well For long (200 ms or longer) signals, With relatively low reverberation and background noise, or With the help of multimodal sensor fusion Otherwise, they usually fail. It is generally believed that: Humans do much better than any algorithmic methods. Cocktail party effect The ultimate goal of acoustic sensing Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 22 / 36

Cocktail Party Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 23 / 36

Outline 1 Introduction 2 Acoustic Sensing With a Microphone Array 3 Sound Source Localization 4 Human Hearing 5 Blind Source Separation 6 Concluding Remarks Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 24 / 36

Human Hearing Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 25 / 36

Human Ear (source: Theory and Applications of Digital Speech Processing, Rabiner and Schafer, Pearson, 2011) Acoustic wavefront hits the eardrum (outer ear) Movement at the eardrum is transmitted via ossicles (middle ear) Stapes is attached to the oval window of the cochlea (inner ear) Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 26 / 36

Sound Representation in the Auditory System (source: Theory and Applications of Digital Speech Processing, Rabiner and Schafer, Pearson, 2011) Cochlear processing Vibration at the oval window is converted to fluid motion in the cochlea Fluid motion causes mechanical vibration on the basilar membrane Mechanical vibration causes action potential in the inner hair cell These action potentials trigger neural firings to be aggregated in the central auditory system Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 27 / 36

Auditory System: A Neural Network (source: Theory and Applications of Digital Speech Processing, Rabiner and Schafer, Pearson, 2011) Auditory System The auditory cortex serves as the central processing unit It creates multiple representations of the neural firings Then the human brain perceives the sound Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 28 / 36

Outline 1 Introduction 2 Acoustic Sensing With a Microphone Array 3 Sound Source Localization 4 Human Hearing 5 Blind Source Separation 6 Concluding Remarks Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 29 / 36

Spectrogram Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 30 / 36

Spectrogram Time-Frequency representation of audio Can be treated as a 2D image Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 30 / 36

Semantic Segmentation: Fully Convolutional Network Semantic segmentation: Pixel-wise image segmentation Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 31 / 36

Blind Source Separation: Fully Convolutaional Network Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 32 / 36

Blind Source Separation: Some Results The FCN created the binary mask from a mixture signal Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 33 / 36

Summary Acoustic Sensing Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 34 / 36

Summary Acoustic Sensing + Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 34 / 36

Summary Acoustic Sensing + Artificial Intelligence Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 34 / 36

Conclusion Past Current Future Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 35 / 36

Questions? Bowon Lee, Inha University Acoustic Sensing With Artificial Intelligence NVIDIA Deep Learning Day 2017 36 / 36