Binaural Hearing for Robots Introduction to Robot Hearing

Similar documents
COM3502/4502/6502 SPEECH PROCESSING

Spatial hearing and sound localization mechanisms in the brain. Henri Pöntynen February 9, 2016

J Jeffress model, 3, 66ff

Lecture 7 Hearing 2. Raghav Rajan Bio 354 Neurobiology 2 February 04th All lecture material from the following links unless otherwise mentioned:

Sound Waves. Sensation and Perception. Sound Waves. Sound Waves. Sound Waves

Sound Localization PSY 310 Greg Francis. Lecture 31. Audition

Systems Neuroscience Oct. 16, Auditory system. http:

Chapter 11: Sound, The Auditory System, and Pitch Perception

BCS 221: Auditory Perception BCS 521 & PSY 221

Auditory System & Hearing

Hearing in the Environment

The Structure and Function of the Auditory Nerve

Auditory System. Barb Rohrer (SEI )

ID# Exam 2 PS 325, Fall 2003

Hearing. istockphoto/thinkstock

Signals, systems, acoustics and the ear. Week 5. The peripheral auditory system: The ear as a signal processor

Binaural Hearing. Steve Colburn Boston University

How Do Our Ears Work? Quiz

Before we talk about the auditory system we will talk about the sound and waves

Receptors / physiology

Required Slide. Session Objectives

Human Acoustic Processing

Otoconia: Calcium carbonate crystals Gelatinous mass. Cilia. Hair cells. Vestibular nerve. Vestibular ganglion

Lecture 6 Hearing 1. Raghav Rajan Bio 354 Neurobiology 2 January 28th All lecture material from the following links unless otherwise mentioned:

HEARING AND PSYCHOACOUSTICS

Deafness and hearing impairment

SPHSC 462 HEARING DEVELOPMENT. Overview Review of Hearing Science Introduction

to vibrate the fluid. The ossicles amplify the pressure. The surface area of the oval window is

Hearing. PSYCHOLOGY (8th Edition, in Modules) David Myers. Module 14. Hearing. Hearing

ENT 318 Artificial Organs Physiology of Ear

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

An Auditory System Modeling in Sound Source Localization

Intro to Audition & Hearing

Introduction to sensory pathways. Gatsby / SWC induction week 25 September 2017

PSY 215 Lecture 10 Topic: Hearing Chapter 7, pages

Lecture 3: Perception

Processing in The Superior Olivary Complex

Hearing II Perceptual Aspects

Hearing. By Jack & Tori

Hearing. By: Jimmy, Dana, and Karissa

SOLUTIONS Homework #3. Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03

SENSATION & PERCEPTION

Presentation On SENSATION. Prof- Mrs.Kuldeep Kaur

The Ear. The ear can be divided into three major parts: the outer ear, the middle ear and the inner ear.

The Outer and Middle Ear PERIPHERAL AUDITORY SYSTEM HOW WE HEAR. The Ear in Action AUDITORY NEUROPATHY: A CLOSER LOOK. The 3 parts of the ear

Auditory System Feedback

The Central Auditory System

Hearing. Juan P Bello

Chapter 13 Physics of the Ear and Hearing

MECHANISM OF HEARING

Central Auditory System Basics and the Effects of Abnormal Auditory Input to the Brain. Amanda M. Lauer, Ph.D. July 3,

PSY 214 Lecture # (11/9/2011) (Sound, Auditory & Speech Perception) Dr. Achtman PSY 214

Acoustic Sensing With Artificial Intelligence

The Auditory Nervous System

College of Medicine Dept. of Medical physics Physics of ear and hearing /CH

Auditory Physiology Richard M. Costanzo, Ph.D.

ID# Final Exam PS325, Fall 1997

16.400/453J Human Factors Engineering /453. Audition. Prof. D. C. Chandra Lecture 14

SPECIAL SENSES: THE AUDITORY SYSTEM

Anatomy and Physiology of Hearing

SUBJECT: Physics TEACHER: Mr. S. Campbell DATE: 15/1/2017 GRADE: DURATION: 1 wk GENERAL TOPIC: The Physics Of Hearing

Spectrograms (revisited)

SENSORY SYSTEM VII THE EAR PART 1

Hearing Sound. The Human Auditory System. The Outer Ear. Music 170: The Ear

Music 170: The Ear. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) November 17, 2016

Sound. Audition. Physics of Sound. Properties of sound. Perception of sound works the same way as light.

Audition. Sound. Physics of Sound. Perception of sound works the same way as light.

how we hear. Better understanding of hearing loss The diagram above illustrates the steps involved.

Perception of Sound. To hear sound, your ear has to do three basic things:

A Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar

Vision and Audition. This section concerns the anatomy of two important sensory systems, the visual and the auditory systems.

3-D Sound and Spatial Audio. What do these terms mean?

Cranial Nerves VII to XII

Sound localization psychophysics

Sensory Systems Vision, Audition, Somatosensation, Gustation, & Olfaction

HEARING. Structure and Function

Diagnosing and Treating Adults with Hearing Loss

Running head: HEARING-AIDS INDUCE PLASTICITY IN THE AUDITORY SYSTEM 1

What is sound? Range of Human Hearing. Sound Waveforms. Speech Acoustics 5/14/2016. The Ear. Threshold of Hearing Weighting

THE EAR AND HEARING Be sure you have read and understand Chapter 16 before beginning this lab. INTRODUCTION: hair cells outer ear tympanic membrane

9.01 Introduction to Neuroscience Fall 2007

Sound Interfaces Engineering Interaction Technologies. Prof. Stefanie Mueller HCI Engineering Group

Learning Targets. Module 20. Hearing Explain how the ear transforms sound energy into neural messages.

Organization. The physics of sound. Measuring sound intensity. Fourier analysis. (c) S-C. Liu, Inst of Neuroinformatics 1

Discrete Signal Processing

ID# Exam 2 PS 325, Fall 2009

HEAR YE! HEAR YE! (1.5 Hours)

THE MECHANICS OF HEARING

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

Sound and Hearing. Decibels. Frequency Coding & Localization 1. Everything is vibration. The universe is made of waves.

Assistive Technology Project. Presented By: Rose Aldan

Activity 1: Anatomy of the Eye and Ear Lab

Binaural Hearing. Why two ears? Definitions

Unit VIII Problem 9 Physiology: Hearing

HCS 7367 Speech Perception

PSY 214 Lecture 16 (11/09/2011) (Sound, auditory system & pitch perception) Dr. Achtman PSY 214

Sound and its characteristics. The decibel scale. Structure and function of the ear. Békésy s theory. Molecular basis of hair cell function.

The speed at which it travels is a function of the density of the conducting medium.

Converting Sound Waves into Neural Signals, Part 1. What happens to initiate neural signals for sound?

Series Preface. Preface. Acknowledgments. Abbreviations

Transcription:

Binaural Hearing for Robots Introduction to Robot Hearing 1Radu Horaud

Binaural Hearing for Robots 1. Introduction to Robot Hearing 2. Methodological Foundations 3. Sound-Source Localization 4. Machine Learning and Binaural Hearing 5. Fusion of Audio and Vision 0Radu Horaud

1. Introduction to Robot Hearing 1. Why do robots need to hear? 2. Human-robot interaction 3. Auditory scene analysis 4. Audio signal processing in brief 5. Audio processing in the ear 6. Audio processing in the midbrain 7. Audio processing in the brain Radu 0 Horaud Binaural Hearing for Robots

1. Introduction to Robot Hearing 1. Why do robots need to hear? 2. Human-robot interaction 3. Auditory scene analysis 4. Audio signal processing in brief 5. Audio processing in the ear 6. Audio processing in the midbrain 7. Audio processing in the brain Radu 0 Horaud Binaural Hearing for Robots

Robots in Industry Researchers and engineers have developed sophisticated robots that can work efficiently in well-structured and predictable environments. Examples: manufacturing industries (cars, ships, planes, electric appliances, etc.), The emphasis has been on studying physical interactions between robots and objects/environment, for example: perform complex articulated movements, grasp/ungrasp and assemble/disassemble objects, move safely among obstacles, perform specific tasks: painting, welding, etc. 1

Robots and People Robots are expected to gradually move from factory floors to populated spaces, therefore there is a paradigm shift from robot-object interactions to human-robot interactions (HRI). HRI has the ambition to enable unconstrained communication between robots and people. Robot hearing is a fundamental ability because it allows interaction via highly rich content, e.g., speech. The challenge: Extract content from complex acoustic signals. 2

Acoustic Signals Are Very Rich Robot hearing should not be limited to speech recognition. There is a large number of acoustic events, other than speech, that are of interest to a robot: laughing, crying, whistling, singing, hand clapping, etc. animal sounds, musical instruments, environmental sounds, falling objects/people, warnings, electronic appliances, hazards, etc. 3

Session Summary Robots in industry Robots and people What can be done with hearing? 4

1. Introduction to Robot Hearing 1. Why do robots need to hear? 2. Human-robot interaction 3. Auditory scene analysis 4. Audio signal processing in brief 5. Audio processing in the ear 6. Audio processing in the midbrain 7. Audio processing in the brain Radu 0 Horaud Binaural Hearing for Robots

Constrained (Tethered) HRI Inspired from human-computer interaction (HCI). Advantage: clean acoustic signals available for automatic speech recognition (95% recognition rate). Disadvantages: restricted to speech communication with a single user that must wear a microphone, sound localization and recognition of other types of acoustic events not possible. audio perception cannot be combined with other sensorial modalities (vision). 1

Unconstrained (Untethered) HRI The microphones are embedded into the robot head. The relevant auditory signals are at some distance from the robot. These signals are mixed with signals from other sound sources, affected by reverberations, noise, etc. Auditory processing is more difficult in this case. This setup allows auditory scene analysis. It is possible to combine audio and vision. 2

Automatic Speech Recognition 3

People-Robot Interaction 4

Session Summary Human-computer interaction Human-robot interaction Speech recognition Other hearing tasks 5

1. Introduction to Robot Hearing 1. Why do robots need to hear? 2. Human-robot interaction 3. Auditory scene analysis 4. Audio signal processing in brief 5. Audio processing in the ear 6. Audio processing in the midbrain 7. Audio processing in the brain Radu 0 Horaud Binaural Hearing for Robots

Auditory Scene Analysis With two or more microphones it is possible to address several problems: detect the presence/absence of the sound sources, extract the signal emitted by each emitting source, localize the sounds and track them over time, identify each source (voice, musical instrument, etc.) enhance the signals for subsequent ASR. 1

Robot Hearing Challenges A robot hearing system must solve several problems: Analyis: Who are the speakers in front of the robot? Where are the speakers? When do they speak? What are they saying? Interaction: Is the robot able to pop into the conversation? Is the robot able to synthesize appropriate behavior? 2

Constrained Audio Interaction s(t): the signal emitted by a speaker is a function of time t, m(t): the signal recorded with a microphone, n(t): additive noise. We have: m(t) =s(t)+n(t) The emitted and recorded signals differ by additive noise (for example, microphone noise). Extracting a clean speech signal, for subsequent automatic speech recognition (ASR), is straightforward in this case. 3

Unconstrained Audio Interaction The relationship between the emitted and received signals is more complex m(t) =h 1 (t)? s 1 (t)+h 2 (t)? s 2 (t)+...+ h K (t)? s K (t)+n(t) Indeed, the microphone records not a single signal but several signals: s 1 (t), s 2 (t),...,s K (t). When waves travel from their positions to the microphone, they are modified. These modifications can be modeled by convolution (the symbol?) between an unknown transfer function, h k (t), and the emitted signal, s k (t). Extracting clean signals is more difficult! 4

Session Summary Auditory scene analysis Robot hearing challenges Constrained interaction Unconstrained interaction 5

1. Introduction to Robot Hearing 1. Why do robots need to hear? 2. Human-robot interaction 3. Auditory scene analysis 4. Audio signal processing in brief 5. Audio processing in the ear 6. Audio processing in the midbrain 7. Audio processing in the brain Radu 0 Horaud Binaural Hearing for Robots

Using Several Microphones For each microphone j = 1...J we have a different equation: m 1 (t) =h 11 (t)? s 1 (t)+h 21 (t)? s 2 (t)+...+ h K 1 (t)? s K (t)+n 1 (t). m j (t) =h 1j (t)? s 1 (t)+h 2j (t)? s 2 (t)+...+ h Kj (t)? s K (t)+n j (t). m J (t) =h 1J (t)? s 1 (t)+h 2J (t)? s 2 (t)+...+ h KJ (t)? s K (t)+n J (t) 1

Binaural Hearing The microphones are plugged into the left and right ears" of a dummy head. In the absence of reverberations, there is a head-related transfer function (HRTF) for each ear: m L (t) =h L (t)? s 1 (t)+h L (t)? s 2 (t)+...+ h L (t)? s K (t)+n L (t) m R (t) =h R (t)? s 1 (t)+h R (t)? s 2 (t)+...+ h R (t)? s K (t)+n L (t) This general setting is difficult to solve in practice. 2

Spectral Representation In signal processing, it is common to perform spectral analysis, namely to apply the Fourier Transform (FT) to the microphone signals. In practice, one should use the discrete Fourier transform (DFT). The DFT applied over a short period of time is called the short-time Fourier transform (STFT). By applying the STFT to a signal at discrete time steps, one obtains a spectrogram (see next slide). Each spectrogram point indicates the amount of oscillation contained in the signal at time t and at frequency f 3

Spectrogram A Spectrogram is an example of a time-frequency representation of a signal. 4

Binaural Cues for Sound Localization Suppose that there is a single emitting sound source. The signals received by the two ears are different. Their difference is mainly characterized by two cues: They reach each ear at different times: interaural time difference, or ITD. They have different intensities (or levels): interaural intensity (or level) difference, or IID (ILD). 5

ITD and ILD ITD: interaural time difference (also TDOA or time difference of arrival) ILD: interaural level difference From these two features it is possible to localize the sound source. 6

Session Summary Audio signal processing Emitted and perceived signals Spectrograms Binaural localization cues 7

1. Introduction to Robot Hearing 1. Why do robots need to hear? 2. Human-robot interaction 3. Auditory scene analysis 4. Audio signal processing in brief 5. Audio processing in the ear 6. Audio processing in the midbrain 7. Audio processing in the brain Radu 0 Horaud Binaural Hearing for Robots

The Auditory Pathways: the Ears and the Brain All the living species, from fish to primates (monkeys, humans) have binaural hearing. There is considerable variability in the sub-cortical and cortical organization across species. Roughly speaking, the auditory pathway can be divided into two parts: transformation of sound waves (air pressure) into spike trains (neural activity), and representation and extraction of auditory information (localization, speech, etc.) in different brain areas. 1

The Ear 2

From Air Pressure to Neural Activity Outer ear : composed of the pinna and ear canal, transmits air to the eardrum. Middle ear : it is air filled, contains a chain of little bones, or ossicles that collect sound pressure on the eardrum an concentrates it onto the cochlea; Cochlea : Acts as a mechanical bank of bandwidth filters that transform the air/liquid vibrations into electric currents and then into spike trains. 3

The Cochlea 4

The Main Components of the Cochlea A coiled tube enclosed in a hard bony shell, It is filled with a liquid (salted water), The basilar membrane splits the spaces inside the cochlea into two canals, vestibular and tympanic, The basilar membrane is stiff at its base and flexible at its apex, all along the basilar membrane there is a structure called organ of Corti that contains the hair cells, Hair cells are the hearing receptor cells, or the equivalent of photo-receptor cells on the retina. 5

The Cochlea: From Air Pressure to Electrical Signals The cochlea is equipped with two types of mechanical resistance: the stiffness of the basilar membrane, and the inertia of the liquid Both are graded along the cochlea: the stiffness gradient decreases while the inertial gradient increases. Altogether the cochlea operates as some kind of mechanical frequency analyzer, modeled as a gamma tone filter bank 6

The Organ of Corti There are 15000 hair cells all along the basilar membrane, inner and outer hair cells. The outer cells play the role of amplification. The inner hair cells transform mechanical vibrations into electric signals that are transmitted to the auditory nerve. Mathematically, the information passed to the brain can be represented as a cochleargram (see next slide) which is somehow equivalent to a spectrogram (see slide 29) 7

Cochleargram 8

Session Summary The ear is a very complex organ Air pressure is transformed into electric spikes Detailed ear anatomy Brief description of ear s functions 9

1. Introduction to Robot Hearing 1. Why do robots need to hear? 2. Human-robot interaction 3. Auditory scene analysis 4. Audio signal processing in brief 5. Audio processing in the ear 6. Audio processing in the midbrain 7. Audio processing in the brain Radu 0 Horaud Binaural Hearing for Robots

Auditory Processing in the Brain The brain has the difficult task of transforming the acoustic-wave input into auditory object perception. The data flow travels from the ear to the auditory cortex via the midbrain that is composed of several nuclei The auditory cortex is divided into several areas, both its anatomy and organization varies considerably between the species. 1

The Ascending Auditory Pathway 2

Sound Localization in the Midbrain The superior olivary complex (SOC) is the first station that receives input from both ears The SOC is subdivided into two nuclei, the lateral superior olive (LSO) and the medial superior olive (MSO) These nuclei seem to be well suited to represent and measure time differences (MSO) and intensity differences (LSO) 3

Session Summary The midbrain is composed of nuclei These nuclei preprocess the audio information before is passed to the cortex Several nuclei receive input from both ears. 4

1. Introduction to Robot Hearing 1. Why do robots need to hear? 2. Human-robot interaction 3. Auditory scene analysis 4. Audio signal processing in brief 5. Audio processing in the ear 6. Audio processing in the midbrain 7. Audio processing in the brain Radu 0 Horaud Binaural Hearing for Robots

Auditory Cortex The auditory cortex is largely understood today, there are two main hypotheses: There are two pathways, with a division of labor between spatial and non-spatial processing Dynamically organized processing networks are likely to support auditory perception. The first hypothesis is supported by functional imaging (humans) and single-neuron physiological studies (non-human animals) 1

Auditory Streams in the Cortex Dorsal stream (red) analyze space and motion. Ventral stream (green) involved in auditory-object perception. Core regions (blue) of the auditory cortex for different species. 2

Auditory Areas 3

Session Summary Division of labor: where and what pathways Dorsal and ventral streams The auditory cortex varies from species to species 4

Week Summary (I) It is necessary to augment physical interactions between robots and their environment with cognitive interactions between robots and people. Human-robot interaction (HRI) is more general and more difficult than human-computer interaction HRI goes well beyond speech processing/recognition, it needs auditory scene analysis Binaural hearing allows rich interactions based on acoustic wave processing and understanding 5

Week Summary (II) All animals (fish, frogs, reptiles, owls, bats, cats, monkeys, humans) have two ears... The anatomy/physiology of the ear is well understood... more than just a microphone. The midbrain plays a crucial role in spatial hearing, a simple physiological model for ITD/ILD processing is available. The auditory cortex processes in parallel two flow of information: where and what Biological models for the analysis of complex auditory information are not yet available. 6