Lecture 4: Auditory Perception. Why study perception?

Similar documents
Lecture 3: Perception

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Required Slide. Session Objectives

Linguistic Phonetics Fall 2005

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

Deafness and hearing impairment

Spectrograms (revisited)

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

HCS 7367 Speech Perception

COM3502/4502/6502 SPEECH PROCESSING

Hearing Sound. The Human Auditory System. The Outer Ear. Music 170: The Ear

Music 170: The Ear. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) November 17, 2016

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

Digital Speech and Audio Processing Spring

Auditory Scene Analysis: phenomena, theories and computational models

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Chapter 11: Sound, The Auditory System, and Pitch Perception

Systems Neuroscience Oct. 16, Auditory system. http:

PHYS 1240 Sound and Music Professor John Price. Cell Phones off Laptops closed Clickers on Transporter energized

Topic 4. Pitch & Frequency

ENT 318 Artificial Organs Physiology of Ear

Hearing. istockphoto/thinkstock

Computational Perception /785. Auditory Scene Analysis

Hearing. Juan P Bello

Sound Waves. Sensation and Perception. Sound Waves. Sound Waves. Sound Waves

HEARING AND PSYCHOACOUSTICS

PSY 215 Lecture 10 Topic: Hearing Chapter 7, pages

L2: Speech production and perception Anatomy of the speech organs Models of speech production Anatomy of the ear Auditory psychophysics

Hearing. and other senses

SPHSC 462 HEARING DEVELOPMENT. Overview Review of Hearing Science Introduction

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32

Acoustics Research Institute

ID# Exam 2 PS 325, Fall 2003

HCS 7367 Speech Perception

Signals, systems, acoustics and the ear. Week 5. The peripheral auditory system: The ear as a signal processor

Topic 4. Pitch & Frequency. (Some slides are adapted from Zhiyao Duan s course slides on Computer Audition and Its Applications in Music)

Outline. 4. The Ear and the Perception of Sound (Psychoacoustics) A.1 Outer Ear Amplifies Sound. Introduction

ID# Final Exam PS325, Fall 1997

Issues faced by people with a Sensorineural Hearing Loss

Auditory Physiology PSY 310 Greg Francis. Lecture 30. Organ of Corti

The Structure and Function of the Auditory Nerve

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Auditory System. Barb Rohrer (SEI )

PSY 214 Lecture # (11/9/2011) (Sound, Auditory & Speech Perception) Dr. Achtman PSY 214

ID# Exam 2 PS 325, Fall 2009

Intro to Audition & Hearing

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

HEARING. Structure and Function

J Jeffress model, 3, 66ff

Receptors / physiology

Hearing. Figure 1. The human ear (from Kessel and Kardon, 1979)

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Representation of sound in the auditory nerve

An Auditory-Model-Based Electrical Stimulation Strategy Incorporating Tonal Information for Cochlear Implant

Modulation and Top-Down Processing in Audition

BCS 221: Auditory Perception BCS 521 & PSY 221

Auditory System & Hearing

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

16.400/453J Human Factors Engineering /453. Audition. Prof. D. C. Chandra Lecture 14

Lecture 6 Hearing 1. Raghav Rajan Bio 354 Neurobiology 2 January 28th All lecture material from the following links unless otherwise mentioned:

Auditory scene analysis in humans: Implications for computational implementations.

Binaural Hearing. Steve Colburn Boston University

Hearing II Perceptual Aspects

Who are cochlear implants for?

SOLUTIONS Homework #3. Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03

Using Source Models in Speech Separation

Thresholds for different mammals

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches

3-D Sound and Spatial Audio. What do these terms mean?

Outline. The ear and perception of sound (Psychoacoustics) A.1 Outer Ear Amplifies Sound. Introduction

Binaural Hearing for Robots Introduction to Robot Hearing

Speech Generation and Perception

Sound, Mixtures, and Learning

Human Acoustic Processing

Hearing. PSYCHOLOGY (8th Edition, in Modules) David Myers. Module 14. Hearing. Hearing

Discrete Signal Processing

Music and Hearing in the Older Population: an Audiologist's Perspective

The Ear. The ear can be divided into three major parts: the outer ear, the middle ear and the inner ear.

Chapter 1: Introduction to digital audio

SPECIAL SENSES: THE AUDITORY SYSTEM

Hearing: Physiology and Psychoacoustics

Hearing I: Sound & The Ear

Hearing in the Environment

The Production of Speech Speed of Sound Air (at sea level, 20 C) 343 m/s (1230 km/h) V=( T) m/s

PSY 214 Lecture 16 (11/09/2011) (Sound, auditory system & pitch perception) Dr. Achtman PSY 214

Auditory Physiology Richard M. Costanzo, Ph.D.

Automatic audio analysis for content description & indexing

Auditory nerve. Amanda M. Lauer, Ph.D. Dept. of Otolaryngology-HNS

Loudness. Loudness is not simply sound intensity!

Mechanical Properties of the Cochlea. Reading: Yost Ch. 7

11 Music and Speech Perception

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

21/01/2013. Binaural Phenomena. Aim. To understand binaural hearing Objectives. Understand the cues used to determine the location of a sound source

Hearing I: Sound & The Ear

How Do Our Ears Work? Quiz

Psychoacoustical Models WS 2016/17

Lecture 9: Speech Recognition: Front Ends

Healthy Organ of Corti. Loss of OHCs. How to use and interpret the TEN(HL) test for diagnosis of Dead Regions in the cochlea

Transcription:

EE E682: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1 2 3 4 5 6 Motivation: Why & how Auditory physiology Psychophysics: Detection & discrimination Pitch perception Speech perception Auditory organization & Scene analysis Dan Ellis <dpwe@ee.columbia.edu> http://www.ee.columbia.edu/~dpwe/e682/ Columbia University Dept. of Electrical Engineering Spring 26 E682 SAPR - Dan Ellis L4 - Perception 26-2-9-1 1 Why study perception? Perception is messy: Can we avoid it? No! Audition provides the ground truth in audio - what is relevant and irrelevant - subjective importance of distortion (coding etc.) - (there could be other information in sound...) Some sounds are designed for audition - co-evolution of speech and hearing The auditory system is very successful - we would do extremely well to duplicate it We are now able to model complex systems - faster computers, bigger memories E682 SAPR - Dan Ellis L4 - Perception 26-2-9-2

How to study perception? Three different approaches: Analyze the example: physiology - dissection & nerve recordings Black box input/output: psychophysics - fit simple models of simple functions Information processing models - investigate and model complex functions - e.g. scene analysis, speech perception E682 SAPR - Dan Ellis L4 - Perception 26-2-9-3 Outline 1 2 3 4 5 6 Motivation Physiology - Outer, middle & inner ear - The Auditory Nerve and beyond - Models Psychophysics Pitch perception Speech perception Scene analysis E682 SAPR - Dan Ellis L4 - Perception 26-2-9-4

2 Physiology Processing chain from air to brain: Middle ear Auditory nerve Outer ear Inner ear (cochlea) Midbrain Cortex Study via: - anatomy - nerve recordings Signals flow in both directions E682 SAPR - Dan Ellis L4 - Perception 26-2-9-5 Outer & middle ear Pinna Ear canal Middle ear bones Cochlea (inner ear) Eardrum (tympanum) Pinna horn - complex reflections give spatial (elevation) cues Ear canal - acoustic tube Middle ear - bones provide impedance matching E682 SAPR - Dan Ellis L4 - Perception 26-2-9-6

Inner ear: Cochlea Oval window (from ME bones) Travelling wave Basilar Membrane (BM) Cochlea 16 khz Resonant frequency 5 Hz Position 35mm Mechanical input from middle ear starts traveling wave moving down Basilar Membrane Varying stiffness and mass of BM gives results in continuous variation of resonant frequency At resonance, traveling wave energy is dissipated in BM vibration Frequency (Fourier) analysis E682 SAPR - Dan Ellis L4 - Perception 26-2-9-7 Cochlea hair cells Ear converts sound to BM motion; Each point on BM corresponds to a frequency Cochlea Tectorial membrane Basilar membrane Inner Hair Cell (IHC) Auditory nerve Outer Hair Cell (OHC) Hair cells on BM convert motion into nerve impulses (firings) Inner Hair Cells detect motion Outer Hair Cells? Variable damping? [BM animation] E682 SAPR - Dan Ellis L4 - Perception 26-2-9-8

Inner Hair Cells IHCs convert BM vibration into nerve firings Human hear has ~35 IHCs; Each IHC has ~7 connections to Auditory Nerve Each nerve fires (somes) near peak displacement: Local BM displacement Typical nerve signal (mv) 5 / ms Histogram to get firing probability: Firing count Cycle angle E682 SAPR - Dan Ellis L4 - Perception 26-2-9-9 Auditory nerve (AN) signals Single nerve measurements: Tone burst histogram Spike count 1 Frequency threshold db SPL 8 6 4 Tone burst Time 1 ms 3 One fiber: ~ 25 db dynamic range 2 1 Hz 1 khz 1 khz (log) frequency (approx. constant-q) Rate vs. intensity Spikes/sec 2 1 2 Intensity / db SPL 4 6 8 1 Hearing dynamic range > 1 db Hard to measure: probe living ANs? E682 SAPR - Dan Ellis L4 - Perception 26-2-9-1

AN population response All the information the brain has about sound: - average rate & spike timings on 3, fibers freq / 8ve re 1 Hz Not unlike a (constant-q) spectrogram? 5 4 3 2 1 p ( ) 1 2 3 4 5 6 / ms E682 SAPR - Dan Ellis L4 - Perception 26-2-9-11 Beyond the auditory nerve Ascending and descending Tonotopic? - modulation - position - source?? (from lloydwatts.com) E682 SAPR - Dan Ellis L4 - Perception 26-2-9-12

Periphery models IHC Sound Outer/middle ear filtering Cochlea filterbank IHC Modeled aspects: - outer/middle ear - cochlea filtering - hair cell transduction - efferent feedback? Result: neurogram / cochleagram channel 6 5 4 3 2 1 SlaneyPatterson 12 chans/oct from 18 Hz, BBC1tmp (21218).1.2.3.4.5 / s E682 SAPR - Dan Ellis L4 - Perception 26-2-9-13 Outline 1 2 3 4 5 6 Motivation Physiology Psychophysics - Detection theory modeling - Intensity perception - Masking Pitch perception Speech perception Scene analysis E682 SAPR - Dan Ellis L4 - Perception 26-2-9-14

3 Psychophysics Physiology looks at the implementation; Psychology looks at the function/behavior Analyze audition as signal detection: - psychological tests reflect internal decisions - assume optimal decision process - infer nature of internal representations, noise,... lower bounds on more complex functions Different aspects to measure -, frequency, intensity - tones, complexes, noise - binaural - pitch, detuning p( ω O) E682 SAPR - Dan Ellis L4 - Perception 26-2-9-15 Log(loudness rating) 2.4 2.2 2. 1.8 1.6 Basic psychophysics Relate physical and perceptual variables - e.g. intensity loudness frequency pitch Methodology: subject tests - just noticeable difference (jnd) - magnitude scaling e.g. adjust to twice as loud Results for Intensity vs. Loudness: Weber s law I α I log(l) = k log(i) Hartmann(1993) Classroom loudness scaling data 2.6 Textbook figure: L α I.3 1.4-2 -1 1 Power law fit: L α I.22 Sound level / db log2 ( L ) =.3log 2 ( I) log.3 1 I = ------------- log 1 2.3 db = ------------- ------ log 1 2 1 = db Ú 1 E682 SAPR - Dan Ellis L4 - Perception 26-2-9-16

Loudness as a function of frequency Fletcher-Munson equal-loudness curves: Intensity / db SPL 12 1 8 6 4 2 1 1 1, freq / Hz Equivalent loudness @ 1kHz 1 8 6 4 2 Equivalent loudness @ 1kHz 1 1 Hz 1 khz rapid loudness growth 2 4 6 8 Intensity / db 2 4 6 8 6 4 2 8 Intensity / db Hearing impairment: exaggerates E682 SAPR - Dan Ellis L4 - Perception 26-2-9-17 Loudness as a function of bandwidth Same total energy, different distribution: freq mag mag I Same total energy I B I 1 B freq B 1 freq Loudness... but wider perceived as louder Critical Bandwidth B bandwidth - e.g. 2 chans at -6 db (not -1 db) Critical bands: independent freq. channels - ~ 25 total (4-6 / octave) [sndex] E682 SAPR - Dan Ellis L4 - Perception 26-2-9-18

Simultaneous masking A louder tone can mask the perception of a second tone nearby in frequency: masking absolute tone threshold Intensity / db Suggests an internal noise model: p(x I) p(x I) p(x I+ I) masked threshold log freq σ n internal noise decision variable x E682 SAPR - Dan Ellis L4 - Perception 26-2-9-19 I Sequential masking Backward/forward in : simultaneous masking ~1 db masker envelope Intensity / db masked threshold backward masking ~5 ms forward masking ~1 ms - suggests temporal envelope of decision var. Time-frequency masking skirt : Masking tone intensity freq Masked threshold E682 SAPR - Dan Ellis L4 - Perception 26-2-9-2

What we do and don t hear A B X two-interval forced-choice : X = A or B? Timing: 2ms attack resolution, 2ms discrim - but: spectral splatter Tuning: ~ 1% discrimination - but: beats Spectrum: profile changes, formants - variable -frequency resolution Harmonic phase? Noisy signals & texture (Trace vs. categorical memory) E682 SAPR - Dan Ellis L4 - Perception 26-2-9-21 Outline 1 2 3 4 5 6 Motivation Physiology Psychophysics Pitch perception - Place models - Time models - Multiple cues & competition Speech perception Scene analysis E682 SAPR - Dan Ellis L4 - Perception 26-2-9-22

4 Pitch perception: A classic argument in psychophysics Harmonic complexes are a pattern on AN 7 freq. chan. 6 5 4 3 2 1.5.1 /s -.. but give a fused percept (ecological) What determines the pitch percept? - not the fundamental How is it computed? Two competing models: place and E682 SAPR - Dan Ellis L4 - Perception 26-2-9-23 AN excitation Place model of pitch AN excitation pattern shows individual peaks Pattern matching method to find pitch: broader HF channels cannot resolve harmonics resolved harmonics Correlate with harmonic sieve : frequency channel Pitch strength frequency channel Support: Low harmonics are very important But: Flat-spectrum noise can carry pitch E682 SAPR - Dan Ellis L4 - Perception 26-2-9-24

Time model of pitch Timing information is preserved in AN down to ~ 1ms scale Extract periodicity by e.g. autocorrelation & combine across frequency chans: freq per-channel autocorrelation autocorrelation Summary autocorrelation 1 2 3 common period (pitch) lag / ms But: HF gives weak pitch (in practice) E682 SAPR - Dan Ellis L4 - Perception 26-2-9-25 Alternate & competing cues Pitch perception could rely on various cues - average excitation pattern - summary autocorrelation - more complex pattern matching Relying on just one cue is brittle - e.g. missing fundamental Perceptual system appears to use a flexible, opportunistic combination Optimal detector justification? argmax p( ω o) ω p( o ω) p( ω) = argmax ---------------------------------- ω p( o) = argmax po ( 1 ω) po ( 2 ω) p( ω) ω if o 1 and o 2 are conditionally independent E682 SAPR - Dan Ellis L4 - Perception 26-2-9-26

Outline 1 2 3 4 5 6 Motivation Physiology Psychophysics Pitch perception Speech perception - The sounds of speech - Phoneme perception - Context and top-down influences Scene Analysis E682 SAPR - Dan Ellis L4 - Perception 26-2-9-27 5 Speech perception Highly specialized function - subsequent to source organization? -.. but also can interact Kinds of speech sounds: glide nasal vowel fricative stop burst freq / Hz 4 3 2 6 5 4 3 1 1.4 1.6 1.8 2 2.2 2.4 2.6 /s has a watch thin as a dime - vowels - glides - nasals - stops - fricatives... 2 level/db E682 SAPR - Dan Ellis L4 - Perception 26-2-9-28

Cues to phoneme perception Linguists describe speech with phonemes: h ε z has e a w t cl c ^ θ watch thin as a dime - phonemes define minimal word contrasts Acoustic-phoneticians describe phonemes by: formants & transitions I n I z I d a y bursts & onset s m freq transition stop burst vowel formants voicing onset E682 SAPR - Dan Ellis L4 - Perception 26-2-9-29 Categorical perception (Some) speech sounds perceived categorically rather than analogically - e.g. stop-burst & timing: freq f b stop burst vowel formants burst freq f b / Hz 4 3 2 T P K P 1 P i e ε a - tokens within category are hard to distinguish - category boundaries are very sharp Categories are learned for native tongue - merry / mary / marry c following vowel o u E682 SAPR - Dan Ellis L4 - Perception 26-2-9-3

Where is the information in speech? Articulation of high/low-pass filtered speech: high-pass low-pass Articulation / % 8 6 4 2 1 2 3 4 freq / Hz - sums to more than 1... Speech message is highly redundant - e.g. constraints of language, context listeners can understand with very few cues E682 SAPR - Dan Ellis L4 - Perception 26-2-9-31 Top-down influences: Phonemic restoration (Warren 197) freq / Hz 4 3 What if a noise burst obscures speech? 2 1 1.4 1.6 1.8 2 2.2 2.4 2.6 / s - auditory system restores the missing phoneme... based on semantic context... even in retrospect! Subjects are typically unaware of which sounds are restored E682 SAPR - Dan Ellis L4 - Perception 26-2-9-32

freq / Hz freq / Hz 5 4 3 2 1 5 4 3 2 1 A predisposition for speech: Sinewave replicas (Remez et al. 1994) Replace each formant with a single sinusoid:.5 1 1.5 2 2.5 3 - speech is (somewhat) intelligible - people hear both whistles and speech ( duplex ) - processed as speech despite un-speech-like What does it take to be speech? / s Speech Sines E682 SAPR - Dan Ellis L4 - Perception 26-2-9-33 Simultaneous vowels Mix synthetic vowels with different f s: /iy/ @ 1 Hz db /ah/ @ 125 Hz freq + = Pitch difference helps (though not necessary): % both vowels correct 75 5 25 1 / 1 4 / 2 1 2 4 f (semitones) E682 SAPR - Dan Ellis L4 - Perception 26-2-9-34

Computational models of speech perception Various theoretical - practical models of speech comprehension, e.g. : Speech Phoneme recognition Lexical access Words Grammar constraints Open questions: - mechanism of phoneme classification - mechanism of lexical recall - mechanism of grammar constraints ASR is a practical implementation (?) E682 SAPR - Dan Ellis L4 - Perception 26-2-9-35 Outline 1 2 3 4 5 5 Motivation Physiology Psychophysics Pitch perception Speech perception Scene analysis - Events and sources - Fusion and streaming - Continuity & restoration - Simultaneous vowels E682 SAPR - Dan Ellis L4 - Perception 26-2-9-36

6 Auditory Organization frq/hz Detection model is huge simplification Real role of hearing is much more general: Recover useful information from outside world Sound organization into events and sources: 4 Voice 2 2 4 /s Rumble Stab Research questions: - what determines perception of sources? - how do humans separate mixtures? - how much can we tell about a source? E682 SAPR - Dan Ellis L4 - Perception 26-2-9-37 Auditory scene analysis: Simultaneous fusion Harmonics are distinct on AN, but perceived as one sound ( fused ): freq - depends on common onset - depends on harmonicity (common period) Methodologies: - ask subject how many objects - match attributes e.g. object pitch - manipulate higher level e.g. vowel identity E682 SAPR - Dan Ellis L4 - Perception 26-2-9-38

Sequential grouping: streaming Pattern / rhythm: property of set of objects - subsequent to fusion employs fused events? TRT: 6-15 ms frequency 1 khz f: 2 octaves Measure by relative timing judgments - cannot compare between streams Separate coherence and fusion boundaries Can interact and compete with fusion [sndex] E682 SAPR - Dan Ellis L4 - Perception 26-2-9-39 Continuity & restoration Tone is interrupted by noise burst: What happened? freq? + + + - masking makes tone undetectable during noise Need to infer most probable real-world events - observation equally likely for either explanation - prior on continuous tone much higher choose Top-down influence on perceived events... pulsation threshold [sndex] E682 SAPR - Dan Ellis L4 - Perception 26-2-9-4

Models of auditory organization Psychological accounts suggest bottom-up: input mixture Front end signal features (maps) Object formation discrete objects Grouping rules Source groups freq onset period frq.mod - (Brown 1991) Complications in practice: - formation of separate elements - contradictory cues - influence of top-down constraints (context, expectations)... E682 SAPR - Dan Ellis L4 - Perception 26-2-9-41 Summary Auditory perception provides the ground truth underlying audio processing Physiology specifies information available Psychophysics measures basic sensitivities Sound sources requires further organization Strong contextual effects in speech perception Sound Transduce Multiple represent'ns Scene analysis High-level recognition Parting thought: Is pitch central to communication? Why? E682 SAPR - Dan Ellis L4 - Perception 26-2-9-42