Lecture 8: Spatial sound

Size: px

Start display at page:

Download "Lecture 8: Spatial sound"

Griffin Jefferson
6 years ago
Views:

1 EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Spatial acoustics Binaural perception Synthesizing spatial audio Extracting spatial sounds Dan Ellis <dpwe@ee.columbia.edu> E6820 SAPR - Dan Ellis L08 - Spatilal sound

1 Spatial acoustics Received sound = source + channel - so far, only considered ideal source waveform Sound carries information on its spatial origi

2 1 Spatial acoustics Received sound = source + channel - so far, only considered ideal source waveform Sound carries information on its spatial origin - e.g. ripples in the lake - great evolutionary significance The basis of scene analysis? - yes and no - try blocking an ear E6820 SAPR - Dan Ellis L08 - Spatilal sound

3 Ripples in the lake Listener Source Wavefront c m/s) Energy 1/r 2 Effect of relative position on sound - delay = r/c - energy decay ~ 1/r 2 - absorption ~ G(f) r - direct energy plus reflections Give cues for recovering source position Describe wavefront by its normal E6820 SAPR - Dan Ellis L08 - Spatilal sound

4 Recovering spatial information Source direction as wavefront normal - moving plane found from timing at 3 points B pressure t/c = s = AB cosθ θ A C wavefront time - need to solve correspondence elevation φ range r Space: need 3 parameters - e.g. 2 angles and range azimuth θ E6820 SAPR - Dan Ellis L08 - Spatilal sound

$The effect of the environment Reflection causes additional wavefronts reflection diffraction & shadowing$ causal smearing of signal energy Dry speech airvib16 freq / Hz 8000 6000 4000 2000 + reverb from hlwy16

causal smearing of signal energy Dry speech airvib16 freq / Hz 8000 6000 4000 2000 + reverb from hlwy16

5 The effect of the environment Reflection causes additional wavefronts reflection diffraction & shadowing freq / Hz scattering, absorption - many paths many echoes Reverberant effect - causal smearing of signal energy Dry speech airvib16 freq / Hz reverb from hlwy time / sec E6820 SAPR - Dan Ellis L08 - Spatilal sound time / sec

6 Reverberation impulse response Exponential decay of reflections: h room (t) ~e -t /T freq / Hz hlwy16-128pt window t time / s Frequency-dependent - greater absorption at high frequencies faster decay Size-dependent - larger rooms longer delays slower decay Sabine s equation: 0.049V RT 60 = Sα Time constant as size, absorption E6820 SAPR - Dan Ellis L08 - Spatilal sound

7 Outline Spatial acoustics Binaural perception - The sound at the two ears - Available cues - Perceptual phenomena Synthesizing spatial audio Extracting spatial sounds E6820 SAPR - Dan Ellis L08 - Spatilal sound

8 2 Binaural perception R L head shadow (high freq) source path length difference What is the information in the 2 ear signals? - the sound of the source(s) (L+R) - the position of the source(s) (L-R) Example waveforms (ShATR database) shatr78m3 waveform Left Right E6820 SAPR - Dan Ellis L08 - Spatilal sound time / s

9 Main cues to spatial hearing Interaural time difference (ITD) - from different path lengths around head - dominates in low frequency (< 1.5 khz) - max ~ 750 µs ambiguous for freqs > 600 Hz Interaural intensity difference (IID) - from head shadowing of far ear - negligable for LF; increases with frequency Spectral detail (from pinna relfections) useful for elevation & range Direct-to-reverberant useful for range E6820 SAPR - Dan Ellis L08 - Spatilal sound

Head-Related Transfer Fns (HRTFs) Capture source coupling as impulse responses { ()} l θφr,, ()r t, θφr,, t Collection: (http://phosphor.cipic.ucdavis.

10 Head-Related Transfer Fns (HRTFs) Capture source coupling as impulse responses { ()} l θφr,, ()r t, θφr,, t Collection: ( RIGHT HRIR_021 0 el HRIR_021 0 el 1 HRIR_021 0 el 0 az 45 0 Azimuth / deg 0 1 HRIR_021 0 el 0 az LEFT time / ms time / ms Highly individual! E6820 SAPR - Dan Ellis L08 - Spatilal sound

11 Cone of confusion azimuth θ Cone of confusion (approx equal ITD) Interaural timing cue dominates (below 1kHz) - from differing path lengths to two ears But: only resolves to a cone - Up/down? Front/back? E6820 SAPR - Dan Ellis L08 - Spatilal sound

12 Further cues Pinna causes elevation-dependent coloration Monaural perception - separate coloration from source spectrum? Head motion - synchronized spectral changes - also for ITD (front/back) etc. E6820 SAPR - Dan Ellis L08 - Spatilal sound

13 Combining multiple cues Both ITD and ILD influence azimuth; What happens when they disagree? Identical signals to both ears image is centered l(t) r(t) t 1 ms t l(t) Delaying right channel moves image to left r(t) t t Attenuating left channel returns image to center l(t) r(t) t t - around 0.1 ms / db E6820 SAPR - Dan Ellis L08 - Spatilal sound

14 Binaural position estimation Imperfect results: (Arruda, Kistler & Wightman 1992) 180 Judged Azimuth (Deg) Target Azimuth (Deg) - listening to wrong hrtfs errors - front/back reversals stay on cone of confusion E6820 SAPR - Dan Ellis L08 - Spatilal sound

15 The Precedence Effect Reflections give misleading spatial cues R l(t) r(t) direct reflected t R /c t But: Spatial impression based on 1st wavefront then switches off for ~50 ms -.. even if reflections are louder -.. leads to impression of room E6820 SAPR - Dan Ellis L08 - Spatilal sound

16 Binaural Masking Release Adding noise to reveal target Tone + noise to one ear: tone is masked + t t Identical noise to other ear: tone is audible + t t t - why does this make sense? Binaural Masking Level Difference up to 12dB - greatest for noise in phase, tone anti-phase E6820 SAPR - Dan Ellis L08 - Spatilal sound

17 Outline Spatial acoustics Binaural perception Synthesizing spatial audio - Position - Environment Extracting spatial sounds E6820 SAPR - Dan Ellis L08 - Spatilal sound

18 3 Synthesizing spatial audio Goal: recreate realistic soundfield - hi-fi experience - synthetic environments (VR) Constraints - resources - information (individual HRTFs) - delivery mechanism (headphones) Source material types - live recordings (actual soundfields) - synthetic (studio mixing, virtual environments) E6820 SAPR - Dan Ellis L08 - Spatilal sound

19 Classic stereo L R Intensity panning : no timing modifications, just vary level ±20 db - works as long as listener is equidistant Surround sound: extra channels in center, sides,... - same basic effect - pan between pairs E6820 SAPR - Dan Ellis L08 - Spatilal sound

20 Simulating reverberation Can characterize reverb by impulse response - spatial cues are important - record in stereo - IRs of ~ 1 sec very long convolution Image model: reflections as duplicate sources virtual (image) sources reflected path source listener Early echos in room impulse response: direct path early echos h room (t) Actual reflection may be h ref (t), not δ(t) E6820 SAPR - Dan Ellis L08 - Spatilal sound t

21 Artificial reverberation Reproduce perceptually salient aspects - early echo pattern ( room size impression) - overall decay tail ( wall materials...) - interaural coherence ( spaciousness) Nested allpass filters (Gardner 92) Allpass x[n] -g + z -k + y[n] H(z) = h[n] z -k - g 1 - g z -k 1-g 2 g(1-g 2 ) g 2 (1-g 2 ) g,k g -g k 2k 3k n Nested+Cascade Allpass 50,0.5 30,0.7 20,0.3 Synthetic Reverb + AP 0 AP 1 AP 2 g LPF + + a 0 a 1 a 2 E6820 SAPR - Dan Ellis L08 - Spatilal sound

22 Synthetic binaural audio Source convolved with {L,R} HRTFs gives precise positioning -...for headphone presentation - can combine multiple sources (by adding) Where to get HRTFs? - measured set, but: specific to individual, discrete - interpolate by linear crossfade, PCA basis set - or: parametric model - delay, shadow, pinna Delay Shadow Pinna z -t DL (θ) 1 - b L (θ)z az t Σ p kl (θ,φ) z -t PkL(θ,φ) + Source Room echo K E z -t E z -t DR (θ) 1 - b R (θ)z az t Σ p kr (θ,φ) z -t PkR(θ,φ) Head motion cues? - head tracking + fast updates + (after Brown & Duda '97) E6820 SAPR - Dan Ellis L08 - Spatilal sound

23 Transaural sound Binaural signals without headphones? Can cross-cancel wrap-around signals - speakers S L,R, ears E L,R, binaural signals B L,R. 1 S L = H LL ( B L H RL S R ) B L M B R 1 S R = H RR ( B R H LR S L ) S L S R H LR H RL H LL H RR E L E R Narrow sweet spot - head motion? E6820 SAPR - Dan Ellis L08 - Spatilal sound

24 Soundfield reconstruction Stop thinking about ears just reconstruct pressure + spatial derivatives p(t) / z p(t) / y p(t) / x p(x,y,z,t) - ears in reconstructed field receive same sounds Complex reconstruction setup (ambisonics) - able to preserve head motion cues? E6820 SAPR - Dan Ellis L08 - Spatilal sound

25 Outline Spatial acoustics Binaural perception Synthesizing spatial audio Extracting spatial sounds - Microphone arrays - Modeling binaural processing E6820 SAPR - Dan Ellis L08 - Spatilal sound

26 4 Extracting spatial sounds Given access to soundfield, can we recover separate components? - degrees of freedom: >N signals from N sensors is hard - but: people can do it (somewhat) Information-theoretic approach - use only very general constraints - rely on precision measurements Anthropic approach - examine human perception - attempt to use same information E6820 SAPR - Dan Ellis L08 - Spatilal sound

27 Microphone arrays Signals from multiple microphones can be combined to enhance/cancel certain sources Coincident mics with diff. directional gains m 1 s 1 a a a 22 a 21 m 2 s 2 m 1 a 11 a 12 s = 1 m 2 a 21 a 22 s 2 s 1ˆ = A 1 s 2ˆ m Microphone arrays (endfire) 0 λ = 4D -20 λ = 2D -40 λ = D + D + D + D E6820 SAPR - Dan Ellis L08 - Spatilal sound

28 Adaptive Beamforming & Independent Component Analysis (ICA) Formulate mathematical criteria to optimize Beamforming: Drive interference to zero - cancel energy during nontarget intervals ICA: maximize mutual independence of outputs - from higher-order moments during overlap m 1 a 11 a s x 12 1 m 2 a 21 a 22 s 2 δ MutInfo δa Limited by separation model parameter space - only NxN? E6820 SAPR - Dan Ellis L08 - Spatilal sound

29 Binaural models Human listeners do better? - certainly given only 2 channels Extract ITD and IID cues? Target azimuth Center freq / Hz Interaural cross-correlation lag / ms - cross-correlation finds timing differences - consume counter-moving pulses - how to achieve IID, trading - vertical cues... E6820 SAPR - Dan Ellis L08 - Spatilal sound

30 Nonlinear filtering How to separate sounds based on direction? - estimate direction locally - choose target direction - remove energy from other directions E.g. Kollmeier, Peissig & Hohman 93 frequency FFT Modulus l analysis L w L w 2 Smooth (1-a) (1-az -1 ) S LL OLA- FFT synthesis l' Crosscorrelation L w R * w Smooth (1-a) (1-az -1 ) S LR Gain factor calc g time FFT Modulus r analysis R w R w 2 Smooth (1-a) (1-az -1 ) S RR OLA- FFT synthesis r' X w (mh,2 πk/nt) - IID from L w / R w ; ITD (IPD) from arg{l w R * w } - match to IID/IPD template for desired direction - also reverberation? E6820 SAPR - Dan Ellis L08 - Spatilal sound

31 Summary Spatial sound - sampling at more than one point gives information on origin direction Binaural perception - time & intensity cues used between/within ears Sound rendering - conventional stereo - HRTF-based Spatial analysis - optimal linear techniques - elusive auditory models E6820 SAPR - Dan Ellis L08 - Spatilal sound

32 References B.C.J. Moore, An introduction to the psychology of hearing (4th ed.) Academic, J. Blauert, Spatial Hearing (revised ed.), MIT Press, R.O. Duda, Sound Localization Research, E6820 SAPR - Dan Ellis L08 - Spatilal sound

Binaural Hearing. Why two ears? Definitions

Binaural Hearing. Why two ears? Definitions Binaural Hearing Why two ears? Locating sounds in space: acuity is poorer than in vision by up to two orders of magnitude, but extends in all directions. Role in alerting and orienting? Separating sound