ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 3: Perception 1. Ear Physiology 2. Auditory Psychophysics 3. Pitch Perception 4. Music Perception Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu http://www.ee.columbia.edu/~dpwe/e4896/ E4896 Music Signal Processing (Dan Ellis) 213-2-4-1 /24
1. Ear Physiology The ear is a very sensitive transducer of air pressure variations into nerve firings just above Brownian motion!? Middle ear Auditory nerve Outer ear Inner ear (cochlea) Midbrain Cortex The cochlea is largely understood the brain is more difficult E4896 Music Signal Processing (Dan Ellis) 213-2-4-2 /24
The Ear Impedance matching & transduction Ear canal Pinna Middle ear bones Cochlea (inner ear) Eardrum (tympanum) pinna acts as horn eardrum + bones match impedance cochlea transduces to nerve firings E4896 Music Signal Processing (Dan Ellis) 213-2-4-3 /24
The Cochlea Complex resonant structure Oval window (from ME bones) Travelling wave Basilar Membrane (BM) Cochlea 16 khz Resonant frequency 5 Hz Position 35mm http://www.wadalab.mech.tohoku.ac.jp/fem_bm-e.html active feedback to maintain near-ringing state efferent fibers? E4896 Music Signal Processing (Dan Ellis) 213-2-4 - /24
Hair Cells Transduce mechanical motion to nerve firings Cochlea Tectorial membrane Inner Hair Cell (IHC) Basilar membrane Auditory nerve Outer Hair Cell (OHC) 3, IHCs driving 2, nerves easily damaged E4896 Music Signal Processing (Dan Ellis) 213-2-4-5 /24
Auditory Nerve IHC fires near maximum displacement Local BM displacement Typical nerve signal (mv) 5 time / ms cannot fire every cycle some noise Firing count Cycle angle E4896 Music Signal Processing (Dan Ellis) 213-2-4-6 /24
Nerve Responses Onset enhancement Frequency selectivity Spike count 1 Dynamic range db SPL Tone burst One fiber: ~ 25 db dynamic range Time 1 ms 8 6 4 Spikes/sec 3 2 1 2 1 Hz 1 khz 1 khz (log) frequency Intensity / db SPL E4896 Music Signal Processing (Dan Ellis) 213-2-4-7 /24 2 4 6 8 1 Hearing dynamic range > 1 db
Auditory Nerve Ensemble Ensemble of nerves provide full information Secker-Walker & Searle 9 similar to constant-q log-intensity spectrogram freq / 8ve re 1 Hz PatSla rectsmoo on bbctmp2 (21-2-18) 5 4 3 2 1 1 2 3 4 5 6 time / ms E4896 Music Signal Processing (Dan Ellis) 213-2-4-8 /24
Auditory Models Filterbank + nonlinearity varying (but broad) bandwidth IHC Sound Outer/middle ear filtering Cochlea filterbank IHC 6 SlaneyPatterson 12 chans/oct from 18 Hz, BBC1tmp (21218) 5 channel 4 3 2 1.1.2.3.4.5 time / s E4896 Music Signal Processing (Dan Ellis) 213-2-4-9 /24
2. Auditory Psychophysics Extensive study of relationship between physical (Φ) and psychological (Ψ) values perception is not direct! Common across all perceptual modalities proprioceptive force, body positioning vision hearing Φ - Ψ distinction is important! E4896 Music Signal Processing (Dan Ellis) 213-2-4-1/24
Loudness perception Perception of physical parameter just noticeable difference - jnd magnitude scaling Weber s law: Loudness log(l) = log 1 (L) = db(i) = I I log(l) log(i) L I.3.3 log(i).3 db(i) 33.3 log 1 (L) Log(loudness rating) Hartmann(1993) Classroom loudness scaling data 2.6 1.4-2 -1 Hartmann 1 Sound 9level tracks 19+2 E4896 Music Signal Processing (Dan Ellis) 213-2-4-11/24 2.4 2.2 2. 1.8 1.6 Textbook figure: L α I.3 Power law fit: L α I.22
Equal Loudness Fletcher-Munson curves (1937) 12 Intensity / db SPL 8 4 match intensity to specific 1 khz tone loudness growth 1 1 1, freq / Hz E4896 Music Signal Processing (Dan Ellis) 213-2-4-12/24
Masking Limited dynamic range in cochlea effect within frequency critical bands Intensity / db absolute threshold masking tone masked threshold basis of MPEG Audio log freq Masking tone Forward/backward temporal effects level / db 2 18 16 14 12 1 8 Elevated masking threshold skirt 2 6 15 4 2 5 1 15 5 2 25 E4896 Music Signal Processing (Dan Ellis) 213-2-4-13/24 time / ms 1 freq / Bark tracks 23-25
Limits of Hearing Test what listeners can discriminate A B X Roughly... timing: 2 ms difference, 2 ms ordering tuning: ~1% spectral profile: single components ~ 2 db phase? tones vs. noise... time two-interval forced-choice : X = A or B? E4896 Music Signal Processing (Dan Ellis) 213-2-4-14/24
3. Pitch Perception Complex (non-sinusoidal) tones give a single, fused percept despite harmonics resolved by cochlea 7 freq. chan. 6 5 4 3 2 1.5.1 time/s percept is of a single pitch.. but pitch does NOT rely on the fundamental track 37 E4896 Music Signal Processing (Dan Ellis) 213-2-4-15/24
Place models of pitch Hypothesis: Pitch results from activation pattern AN excitation broader HF channels cannot resolve harmonics resolved harmonics Correlate with harmonic sieve : Pitch strength frequency channel Duifhuis et al. 82 frequency channel support: low harmonics are important but: pitch of noisy signals E4896 Music Signal Processing (Dan Ellis) 213-2-4-16/24
Time models of pitch Autocorrelation neatly unifies pitch phenomena freq per-channel autocorrelation autocorrelation time Summary autocorrelation Meddis & Hewitt 91 1 2 3 common period (pitch) lag / ms but: high-frequency modulation evokes weak pitch E4896 Music Signal Processing (Dan Ellis) 213-2-4-17/24
Competing Cues Perhaps brains use both place & time cues common perceptual strategy: opportunistic combination of information e.g. Probabilistic combination arg max Pr( x) arg max Pr( x 1)Pr( x 2 ) Pr( ) if x1, x2 are independent given θ E4896 Music Signal Processing (Dan Ellis) 213-2-4-18/24
4. Music Perception Hearing music involves instruments notes rhythm Can study with subjective experiments Grey 75 E4896 Music Signal Processing (Dan Ellis) 213-2-4-19/24
Scene Analysis Detect separate events common onset common harmonicity freq / Hz 8 6 4 2 Pierce 83 1 2 3 4 5 6 7 8 9 time / s instruments & timbre E4896 Music Signal Processing (Dan Ellis) 213-2-4-2/24
Musical intervals relate to harmonic proximity Consonance Pitch Helix Warren et al. 23 E4896 Music Signal Processing (Dan Ellis) 213-2-4-21/24
Rhythm Sensitive to periodicity speech? breathing? brain? Onsets + autocorrelation? variations in tapping 4/4 vs 3/4 4 3 2 1 4 3 2 1 1 2 3 4 5 6 7 8 9 1 1 2 3 4 5 6 7 8 9 1 4 2-2 1 2 3 4 5 6 7 8 9 1 8 6 4 2 1 1 2 3 4 5 6 7 8 9 1 4 2-2 1 2 3 4 5 6 7 8 9 1 8 6 4 2 1 1 2 3 4 5 6 7 8 9 1 5 1 2 3 4 5 6 7 8 9 1-1 1 2 3 4 5 6 7 8 9 1 E4896 Music Signal Processing (Dan Ellis) 213-2-4-22/24
Sequences Perceptual effects of sequences e.g. streaming TRT: 6-15 ms frequency 1 khz Δf: 2 octaves time track 36 Music is built of sequences different sensitivities E4896 Music Signal Processing (Dan Ellis) 213-2-4-23/24
Summary Ear converts air pressure to nerve firings spectral analysis Brain does a lot with scarce information dealing with the real world Music is a complex signal multiple sources harmonic structures temporal patterns E4896 Music Signal Processing (Dan Ellis) 213-2-4-24/24
References H. Duifhuis, L. F. Willems, & R. J. Sluyter, Measurement of pitch in speech: An implementation of Goldstein s theory of pitch perception, J. Acoust. Soc. Am., 71(6):1568-158, 1982. H. Fletcher & W. A. Munson, Relation between loudness and masking, J. Acoust. Soc. Am., 9:1-1, 1937. B. Gold, N. Morgan, & D. Ellis, Speech and Audio Signal Processing (2nd ed.), Wiley, 211. J. M. Grey, An Exploration of Musical Timbre, Ph.D. Dissertation, Stanford CCRMA, No. STAN-M-2, 1975. W. M. Hartmann, Auditory demonstrations on compact disk for large N, J. Acoust. Soc. Am., 93(1):1-16, 1993. R. Meddis & M. J. Hewitt, Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification, J. Acoust. Soc. Am., 89(6):2866-2882, 1991. B. C. J. Moore, An Introduction to the Psychology of Hearing (4th ed.), Academic Press, 1997. J. R. Pierce, The Science of Musical Sound, Scientific American Press, 1983. H. E. Secker-Walker & C. L. Searle, Time-domain analysis of auditory-nervefiber firing rates, J. Acoust. Soc. Am., 88(3):1427-1436, 199. J. D. Warren, S. Uppenkamp, R. D. Patterson, & T. D. Griffiths, Separating pitch chroma and pitch height in the human brain, PNAS 1(17):138-42, 23. E4896 Music Signal Processing (Dan Ellis) 213-2-4-25/24