Lecture 8: Spatial sound

Similar documents
Binaural Hearing. Why two ears? Definitions

Sound localization psychophysics

3-D Sound and Spatial Audio. What do these terms mean?

Using Source Models in Speech Separation

IN EAR TO OUT THERE: A MAGNITUDE BASED PARAMETERIZATION SCHEME FOR SOUND SOURCE EXTERNALIZATION. Griffin D. Romigh, Brian D. Simpson, Nandini Iyer

21/01/2013. Binaural Phenomena. Aim. To understand binaural hearing Objectives. Understand the cues used to determine the location of a sound source

Binaural Hearing. Steve Colburn Boston University

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

3-D SOUND IMAGE LOCALIZATION BY INTERAURAL DIFFERENCES AND THE MEDIAN PLANE HRTF. Masayuki Morimoto Motokuni Itoh Kazuhiro Iida

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 THE DUPLEX-THEORY OF LOCALIZATION INVESTIGATED UNDER NATURAL CONDITIONS

Computational Perception /785. Auditory Scene Analysis

Binaural processing of complex stimuli

Literature Overview - Digital Hearing Aids and Group Delay - HADF, June 2017, P. Derleth

Lecture 8: Spatial sound. Spatial acoustics

Abstract. 1. Introduction. David Spargo 1, William L. Martens 2, and Densil Cabrera 3

Effect of spectral content and learning on auditory distance perception

Hearing. Juan P Bello

Hearing II Perceptual Aspects

An Auditory System Modeling in Sound Source Localization

This will be accomplished using maximum likelihood estimation based on interaural level

HEARING AND PSYCHOACOUSTICS

Challenges in microphone array processing for hearing aids. Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany

Lecture 3: Perception

Ear Beamer. Aaron Lucia, Niket Gupta, Matteo Puzella, Nathan Dunn. Department of Electrical and Computer Engineering

A NOVEL HEAD-RELATED TRANSFER FUNCTION MODEL BASED ON SPECTRAL AND INTERAURAL DIFFERENCE CUES

Using Speech Models for Separation

Neural System Model of Human Sound Localization

Effect of microphone position in hearing instruments on binaural masking level differences

HST.723J, Spring 2005 Theme 3 Report

Hearing in the Environment

Spectral and Spatial Parameter Resolution Requirements for Parametric, Filter-Bank-Based HRTF Processing*

Influence of multi-microphone signal enhancement algorithms on auditory movement detection in acoustically complex situations

Sound Localization PSY 310 Greg Francis. Lecture 31. Audition

Chapter 3. Sounds, Signals, and Studio Acoustics

Brian D. Simpson Veridian, 5200 Springfield Pike, Suite 200, Dayton, Ohio 45431

Localization in the Presence of a Distracter and Reverberation in the Frontal Horizontal Plane: II. Model Algorithms

BINAURAL DICHOTIC PRESENTATION FOR MODERATE BILATERAL SENSORINEURAL HEARING-IMPAIRED

Audibility of time differences in adjacent head-related transfer functions (HRTFs) Hoffmann, Pablo Francisco F.; Møller, Henrik

Recognition & Organization of Speech & Audio

William A. Yost and Sandra J. Guzman Parmly Hearing Institute, Loyola University Chicago, Chicago, Illinois 60201

Angular Resolution of Human Sound Localization

Binaural Hearing for Robots Introduction to Robot Hearing

Discrimination and identification of azimuth using spectral shape a)

Speech segregation in rooms: Effects of reverberation on both target and interferer

HearIntelligence by HANSATON. Intelligent hearing means natural hearing.

Systems Neuroscience Oct. 16, Auditory system. http:

HOW TO USE THE SHURE MXA910 CEILING ARRAY MICROPHONE FOR VOICE LIFT

HCS 7367 Speech Perception

J. Acoust. Soc. Am. 116 (2), August /2004/116(2)/1057/9/$ Acoustical Society of America

Lecture 9: Speech Recognition: Front Ends

J Jeffress model, 3, 66ff

Systems for Improvement of the Communication in Passenger Compartments

Spatial hearing and sound localization mechanisms in the brain. Henri Pöntynen February 9, 2016

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

Synthesis of Spatially Extended Virtual Sources with Time-Frequency Decomposition of Mono Signals

THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION

Role of F0 differences in source segregation

Lateralized speech perception in normal-hearing and hearing-impaired listeners and its relationship to temporal processing

The basic mechanisms of directional sound localization are well documented. In the horizontal plane, interaural difference INTRODUCTION I.

Trading Directional Accuracy for Realism in a Virtual Auditory Display

A Microphone-Array-Based System for Restoring Sound Localization with Occluded Ears

Discrete Signal Processing

Auditory System & Hearing

Evidence base for hearing aid features:

Perceptual Plasticity in Spatial Auditory Displays

The role of low frequency components in median plane localization

Auditory Scene Analysis: phenomena, theories and computational models

HAT Process: Determining HAT for a Student

Auditory scene analysis in humans: Implications for computational implementations.

Juha Merimaa b) Institut für Kommunikationsakustik, Ruhr-Universität Bochum, Germany

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches

Improved method for accurate sound localization

Localization: Give your patients a listening edge

Binaural synthesis Møller, Henrik; Jensen, Clemen Boje; Hammershøi, Dorte; Sørensen, Michael Friis

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080

Localization in the Presence of a Distracter and Reverberation in the Frontal Horizontal Plane. I. Psychoacoustical Data

THE RELATION BETWEEN SPATIAL IMPRESSION AND THE PRECEDENCE EFFECT. Masayuki Morimoto

BASIC NOTIONS OF HEARING AND

Sound, Mixtures, and Learning

Signals, systems, acoustics and the ear. Week 5. The peripheral auditory system: The ear as a signal processor

The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet

HCS 7367 Speech Perception

Speech intelligibility in simulated acoustic conditions for normal hearing and hearing-impaired listeners

Publication VI. c 2007 Audio Engineering Society. Reprinted with permission.

16.400/453J Human Factors Engineering /453. Audition. Prof. D. C. Chandra Lecture 14

Auditory principles in speech processing do computers need silicon ears?

Acoustic Sensing With Artificial Intelligence

Acoustics Research Institute

Representation of sound in the auditory nerve

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Digital. hearing instruments have burst on the

On the influence of interaural differences on onset detection in auditory object formation. 1 Introduction

LabROSA Research Overview

The development of a modified spectral ripple test

Two Modified IEC Ear Simulators for Extended Dynamic Range

The use of interaural time and level difference cues by bilateral cochlear implant users

White Paper: micon Directivity and Directional Speech Enhancement

ICaD 2013 ADJUSTING THE PERCEIVED DISTANCE OF VIRTUAL SPEECH SOURCES BY MODIFYING BINAURAL ROOM IMPULSE RESPONSES

Sound Analysis Research at LabROSA

Sound source localization in real sound fields based on empirical statistics of interaural parameters a)

Transcription:

EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound 1 2 3 4 Spatial acoustics Binaural perception Synthesizing spatial audio Extracting spatial sounds Dan Ellis <dpwe@ee.columbia.edu> http://www.ee.columbia.edu/~dpwe/e6820/ E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-1

1 Spatial acoustics Received sound = source + channel - so far, only considered ideal source waveform Sound carries information on its spatial origin - e.g. ripples in the lake - great evolutionary significance The basis of scene analysis? - yes and no - try blocking an ear E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-2

Ripples in the lake Listener Source Wavefront (@ c m/s) Energy 1/r 2 Effect of relative position on sound - delay = r/c - energy decay ~ 1/r 2 - absorption ~ G(f) r - direct energy plus reflections Give cues for recovering source position Describe wavefront by its normal E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-3

Recovering spatial information Source direction as wavefront normal - moving plane found from timing at 3 points B pressure t/c = s = AB cosθ θ A C wavefront time - need to solve correspondence elevation φ range r Space: need 3 parameters - e.g. 2 angles and range azimuth θ E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-4

The effect of the environment Reflection causes additional wavefronts reflection diffraction & shadowing freq / Hz 8000 6000 4000 2000 - + scattering, absorption - many paths many echoes Reverberant effect - causal smearing of signal energy Dry speech airvib16 freq / Hz 8000 6000 4000 2000 + reverb from hlwy16 0 0 0.5 1 1.5 time / sec 0 0 0.5 1 1.5 E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-5 time / sec

Reverberation impulse response Exponential decay of reflections: h room (t) ~e -t /T freq / Hz 8000 6000 hlwy16-128pt window -10-20 -30 4000-40 t 2000 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 time / s -50-60 -70 Frequency-dependent - greater absorption at high frequencies faster decay Size-dependent - larger rooms longer delays slower decay Sabine s equation: 0.049V RT 60 = ----------------- Sα Time constant as size, absorption E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-6

Outline 1 2 3 4 Spatial acoustics Binaural perception - The sound at the two ears - Available cues - Perceptual phenomena Synthesizing spatial audio Extracting spatial sounds E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-7

2 Binaural perception R L head shadow (high freq) source path length difference What is the information in the 2 ear signals? - the sound of the source(s) (L+R) - the position of the source(s) (L-R) Example waveforms (ShATR database) shatr78m3 waveform 0.1 0.05 Left 0-0.05 Right -0.1 2.2 2.205 2.21 2.215 2.22 2.225 2.23 2.235 E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-8 time / s

Main cues to spatial hearing Interaural time difference (ITD) - from different path lengths around head - dominates in low frequency (< 1.5 khz) - max ~ 750 µs ambiguous for freqs > 600 Hz Interaural intensity difference (IID) - from head shadowing of far ear - negligable for LF; increases with frequency Spectral detail (from pinna relfections) useful for elevation & range Direct-to-reverberant useful for range E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-9

Head-Related Transfer Fns (HRTFs) Capture source coupling as impulse responses { ()} l θφr,, ()r t, θφr,, t Collection: (http://phosphor.cipic.ucdavis.edu/) RIGHT HRIR_021 Left @ 0 el HRIR_021 Right @ 0 el 1 HRIR_021 Left @ 0 el 0 az 45 0 Azimuth / deg 0 1 HRIR_021 Right @ 0 el 0 az -45 0 LEFT 0 0.5 1 1.5 0 0.5 1 1.5-1 time / ms 0 0.5 1 1.5 time / ms Highly individual! E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-10

Cone of confusion azimuth θ Cone of confusion (approx equal ITD) Interaural timing cue dominates (below 1kHz) - from differing path lengths to two ears But: only resolves to a cone - Up/down? Front/back? E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-11

Further cues Pinna causes elevation-dependent coloration Monaural perception - separate coloration from source spectrum? Head motion - synchronized spectral changes - also for ITD (front/back) etc. E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-12

Combining multiple cues Both ITD and ILD influence azimuth; What happens when they disagree? Identical signals to both ears image is centered l(t) r(t) t 1 ms t l(t) Delaying right channel moves image to left r(t) t t Attenuating left channel returns image to center l(t) r(t) t t - trading @ around 0.1 ms / db E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-13

Binaural position estimation Imperfect results: (Arruda, Kistler & Wightman 1992) 180 Judged Azimuth (Deg) 120 60 0-60 -120-180 -180-120 -60 0 60 120 180 Target Azimuth (Deg) - listening to wrong hrtfs errors - front/back reversals stay on cone of confusion E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-14

The Precedence Effect Reflections give misleading spatial cues R l(t) r(t) direct reflected t R /c t But: Spatial impression based on 1st wavefront then switches off for ~50 ms -.. even if reflections are louder -.. leads to impression of room E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-15

Binaural Masking Release Adding noise to reveal target Tone + noise to one ear: tone is masked + t t Identical noise to other ear: tone is audible + t t t - why does this make sense? Binaural Masking Level Difference up to 12dB - greatest for noise in phase, tone anti-phase E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-16

Outline 1 2 3 4 Spatial acoustics Binaural perception Synthesizing spatial audio - Position - Environment Extracting spatial sounds E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-17

3 Synthesizing spatial audio Goal: recreate realistic soundfield - hi-fi experience - synthetic environments (VR) Constraints - resources - information (individual HRTFs) - delivery mechanism (headphones) Source material types - live recordings (actual soundfields) - synthetic (studio mixing, virtual environments) E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-18

Classic stereo L R Intensity panning : no timing modifications, just vary level ±20 db - works as long as listener is equidistant Surround sound: extra channels in center, sides,... - same basic effect - pan between pairs E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-19

Simulating reverberation Can characterize reverb by impulse response - spatial cues are important - record in stereo - IRs of ~ 1 sec very long convolution Image model: reflections as duplicate sources virtual (image) sources reflected path source listener Early echos in room impulse response: direct path early echos h room (t) Actual reflection may be h ref (t), not δ(t) E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-20 t

Artificial reverberation Reproduce perceptually salient aspects - early echo pattern ( room size impression) - overall decay tail ( wall materials...) - interaural coherence ( spaciousness) Nested allpass filters (Gardner 92) Allpass x[n] -g + z -k + y[n] H(z) = h[n] z -k - g 1 - g z -k 1-g 2 g(1-g 2 ) g 2 (1-g 2 ) g,k g -g k 2k 3k n Nested+Cascade Allpass 50,0.5 30,0.7 20,0.3 Synthetic Reverb + AP 0 AP 1 AP 2 g LPF + + a 0 a 1 a 2 E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-21

Synthetic binaural audio Source convolved with {L,R} HRTFs gives precise positioning -...for headphone presentation - can combine multiple sources (by adding) Where to get HRTFs? - measured set, but: specific to individual, discrete - interpolate by linear crossfade, PCA basis set - or: parametric model - delay, shadow, pinna Delay Shadow Pinna z -t DL (θ) 1 - b L (θ)z -1 1 - az t Σ p kl (θ,φ) z -t PkL(θ,φ) + Source Room echo K E z -t E z -t DR (θ) 1 - b R (θ)z -1 1 - az t Σ p kr (θ,φ) z -t PkR(θ,φ) Head motion cues? - head tracking + fast updates + (after Brown & Duda '97) E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-22

Transaural sound Binaural signals without headphones? Can cross-cancel wrap-around signals - speakers S L,R, ears E L,R, binaural signals B L,R. 1 S L = H LL ( B L H RL S R ) B L M B R 1 S R = H RR ( B R H LR S L ) S L S R H LR H RL H LL H RR E L E R Narrow sweet spot - head motion? E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-23

Soundfield reconstruction Stop thinking about ears just reconstruct pressure + spatial derivatives p(t) / z p(t) / y p(t) / x p(x,y,z,t) - ears in reconstructed field receive same sounds Complex reconstruction setup (ambisonics) - able to preserve head motion cues? E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-24

Outline 1 2 3 4 Spatial acoustics Binaural perception Synthesizing spatial audio Extracting spatial sounds - Microphone arrays - Modeling binaural processing E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-25

4 Extracting spatial sounds Given access to soundfield, can we recover separate components? - degrees of freedom: >N signals from N sensors is hard - but: people can do it (somewhat) Information-theoretic approach - use only very general constraints - rely on precision measurements Anthropic approach - examine human perception - attempt to use same information E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-26

Microphone arrays Signals from multiple microphones can be combined to enhance/cancel certain sources Coincident mics with diff. directional gains m 1 s 1 a a 12 11 a 22 a 21 m 2 s 2 m 1 a 11 a 12 s = 1 m 2 a 21 a 22 s 2 s 1ˆ = A 1 s 2ˆ m Microphone arrays (endfire) 0 λ = 4D -20 λ = 2D -40 λ = D + D + D + D E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-27

Adaptive Beamforming & Independent Component Analysis (ICA) Formulate mathematical criteria to optimize Beamforming: Drive interference to zero - cancel energy during nontarget intervals ICA: maximize mutual independence of outputs - from higher-order moments during overlap m 1 a 11 a s x 12 1 m 2 a 21 a 22 s 2 δ MutInfo δa Limited by separation model parameter space - only NxN? E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-28

Binaural models Human listeners do better? - certainly given only 2 channels Extract ITD and IID cues? Target azimuth Center freq / Hz 3200 1600 800 400 200 Interaural cross-correlation 100-6 -4-2 0 2 4 6 lag / ms - cross-correlation finds timing differences - consume counter-moving pulses - how to achieve IID, trading - vertical cues... E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-29

Nonlinear filtering How to separate sounds based on direction? - estimate direction locally - choose target direction - remove energy from other directions E.g. Kollmeier, Peissig & Hohman 93 frequency FFT Modulus l analysis L w L w 2 Smooth (1-a) (1-az -1 ) S LL OLA- FFT synthesis l' Crosscorrelation L w R * w Smooth (1-a) (1-az -1 ) S LR Gain factor calc g time FFT Modulus r analysis R w R w 2 Smooth (1-a) (1-az -1 ) S RR OLA- FFT synthesis r' X w (mh,2 πk/nt) - IID from L w / R w ; ITD (IPD) from arg{l w R * w } - match to IID/IPD template for desired direction - also reverberation? E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-30

Summary Spatial sound - sampling at more than one point gives information on origin direction Binaural perception - time & intensity cues used between/within ears Sound rendering - conventional stereo - HRTF-based Spatial analysis - optimal linear techniques - elusive auditory models E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-31

References B.C.J. Moore, An introduction to the psychology of hearing (4th ed.) Academic, 1997. J. Blauert, Spatial Hearing (revised ed.), MIT Press, 1996. R.O. Duda, Sound Localization Research, http://www.engr.sjsu.edu/~duda/duda.research.frameset.html E6820 SAPR - Dan Ellis L08 - Spatilal sound 2002-04-01-32