Digital Speech and Audio Processing Spring

Similar documents
Deafness and hearing impairment

Required Slide. Session Objectives

Hearing Sound. The Human Auditory System. The Outer Ear. Music 170: The Ear

Music 170: The Ear. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) November 17, 2016

MECHANISM OF HEARING

Lecture 3: Perception

Linguistic Phonetics Fall 2005

Auditory System. Barb Rohrer (SEI )

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

HEARING. Structure and Function

SUBJECT: Physics TEACHER: Mr. S. Campbell DATE: 15/1/2017 GRADE: DURATION: 1 wk GENERAL TOPIC: The Physics Of Hearing

Intro to Audition & Hearing

Chapter 11: Sound, The Auditory System, and Pitch Perception

Sound. Audition. Physics of Sound. Properties of sound. Perception of sound works the same way as light.

Audition. Sound. Physics of Sound. Perception of sound works the same way as light.

ENT 318 Artificial Organs Physiology of Ear

Systems Neuroscience Oct. 16, Auditory system. http:

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

HCS 7367 Speech Perception

Issues faced by people with a Sensorineural Hearing Loss

Hearing. and other senses

Receptors / physiology

Signals, systems, acoustics and the ear. Week 5. The peripheral auditory system: The ear as a signal processor

Auditory Physiology Richard M. Costanzo, Ph.D.

The Ear. The ear can be divided into three major parts: the outer ear, the middle ear and the inner ear.

PHYS 1240 Sound and Music Professor John Price. Cell Phones off Laptops closed Clickers on Transporter energized

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

Hearing. Juan P Bello

Lecture 4: Auditory Perception. Why study perception?

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

L2: Speech production and perception Anatomy of the speech organs Models of speech production Anatomy of the ear Auditory psychophysics

THE MECHANICS OF HEARING

Hearing. istockphoto/thinkstock

HEARING AND PSYCHOACOUSTICS

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32

College of Medicine Dept. of Medical physics Physics of ear and hearing /CH

Mechanical Properties of the Cochlea. Reading: Yost Ch. 7

Sound and its characteristics. The decibel scale. Structure and function of the ear. Békésy s theory. Molecular basis of hair cell function.

COM3502/4502/6502 SPEECH PROCESSING

Auditory Physiology PSY 310 Greg Francis. Lecture 30. Organ of Corti

Acoustics Research Institute

Structure, Energy Transmission and Function. Gross Anatomy. Structure, Function & Process. External Auditory Meatus or Canal (EAM, EAC) Outer Ear

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

Hearing: Physiology and Psychoacoustics

ID# Exam 2 PS 325, Fall 2003

Sound Waves. Sensation and Perception. Sound Waves. Sound Waves. Sound Waves

PSY 215 Lecture 10 Topic: Hearing Chapter 7, pages

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Auditory System Feedback

Unit VIII Problem 9 Physiology: Hearing

Thresholds for different mammals

Hearing Lectures. Acoustics of Speech and Hearing. Subjective/Objective (recap) Loudness Overview. Sinusoids through ear. Facts about Loudness

Sound and Hearing. Decibels. Frequency Coding & Localization 1. Everything is vibration. The universe is made of waves.

SOLUTIONS Homework #3. Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03

Topic 4. Pitch & Frequency

Perception of Sound. To hear sound, your ear has to do three basic things:

Outline. 4. The Ear and the Perception of Sound (Psychoacoustics) A.1 Outer Ear Amplifies Sound. Introduction

Before we talk about the auditory system we will talk about the sound and waves

Educational Module Tympanometry. Germany D Germering

Learning Targets. Module 20. Hearing Explain how the ear transforms sound energy into neural messages.

Chapter 13 Physics of the Ear and Hearing

Loudness. Loudness is not simply sound intensity!

Human Acoustic Processing

Speech Generation and Perception

Chapter 1: Introduction to digital audio

Psychoacoustical Models WS 2016/17

How Do Our Ears Work? Quiz

ID# Final Exam PS325, Fall 1997

Presentation On SENSATION. Prof- Mrs.Kuldeep Kaur

HEARING GUIDE PREPARED FOR CLINICAL PROFESSIONALS HEARING.HEALTH.MIL. HCE_ClinicalProvider-Flip_FINAL01.indb 1

Topic 4. Pitch & Frequency. (Some slides are adapted from Zhiyao Duan s course slides on Computer Audition and Its Applications in Music)

Hearing and Balance 1

The Structure and Function of the Auditory Nerve

Hearing. By Jack & Tori

Hearing. Figure 1. The human ear (from Kessel and Kardon, 1979)

Binaural Hearing. Steve Colburn Boston University

Printable version - Hearing - OpenLearn - The Open University

Chapter 3. Sounds, Signals, and Studio Acoustics

SPECIAL SENSES: THE AUDITORY SYSTEM

SPHSC 462 HEARING DEVELOPMENT. Overview Review of Hearing Science Introduction

Anatomy and Physiology of Hearing

ClaroTM Digital Perception ProcessingTM

Spectrograms (revisited)

Ganglion Cells Blind Spot Cornea Pupil Visual Area of the Bipolar Cells Thalamus Rods and Cones Lens Visual cortex of the occipital lobe

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Hearing Lecture notes (1): Introductory Hearing

THE EAR AND HEARING Be sure you have read and understand Chapter 16 before beginning this lab. INTRODUCTION: hair cells outer ear tympanic membrane

Outline. The ear and perception of sound (Psychoacoustics) A.1 Outer Ear Amplifies Sound. Introduction

PSY 214 Lecture # (11/9/2011) (Sound, Auditory & Speech Perception) Dr. Achtman PSY 214

Auditory nerve. Amanda M. Lauer, Ph.D. Dept. of Otolaryngology-HNS

Sound and the auditory system

THE EAR Dr. Lily V. Hughes, Audiologist

managing safely Noise at Work Course Notes Mark Mallen Group Health and Safety Manager December 2005 Noise at Work: Version 1 Page 1 of 23

Chapter 17, Part 2! The Special Senses! Hearing and Equilibrium!

Chapter 17, Part 2! Chapter 17 Part 2 Special Senses! The Special Senses! Hearing and Equilibrium!

Auditory Physiology PSY 310 Greg Francis. Lecture 29. Hearing

PSY 310: Sensory and Perceptual Processes 1

Hearing I: Sound & The Ear

Music and Hearing in the Older Population: an Audiologist's Perspective

Transcription:

Digital Speech and Audio Processing Spring 2008-1

Ear Anatomy 1. Outer ear: Funnels sounds / amplifies 2. Middle ear: acoustic impedance matching mechanical transformer 3. Inner ear: acoustic transformer : mechanical to nerve impulses Digital Speech and Audio Processing Spring 2008-2

Ear Anatomy Outer ear 1. Pinna (external ear) : channel sounds into ear and aids in localization 2. Ear canal : 2.5 cm. Resonates ~ 3000 Hz 3. Eardrum (tympanic membrane) 1. Sound amplification Resonance amplifies sounds in the range of 3-5KHz Increase up 15 db. 2. Localisation of sounds Left/Right: Intensity + timing difference between ears Front/behind/below/above: different resonance filters due to interference waves More sensitive to sounds coming from front than back. Digital Speech and Audio Processing Spring 2008-3

Ear Anatomy Middle ear Ear drum: hard membrane. 0.1 mm thick. Flex on edges. Vibrates under sound waves. Hammer (Malleus). Stirrup (Stapes). Anvil (Incus): acoustic impedance matching transformer (otherwise most energy reflects back) Muscles protect inner ear from violent vibrations Eustachian tube: Connects middle ear to vocal tract Removes static pressure difference between middle and outer ears Denes&Pinson 1993 Digital Speech and Audio Processing Spring 2008-4

Outer/middle ear amplification/filtering Allows better reception from sounds in front than back of listener ~15 db Together, there is a peak increase of about 20 db near 1kHz Middle ear also acts as a lowpass filter, with attenuation of 15dB/oct above 1kHz Digital Speech and Audio Processing Spring 2008-5

Loud sounds 1. Stapedius: draws stapes away from the oval window 2. Tensor tympani muscle tightens: Pulls the eardrum into the middle ear Increases stiffness of the ossicular chain, thus reducing the transmitted force to inner ear. Reduces transmission of low freq. < 2 khz loss up to 20 db 1. Not effective against high frequencies 2. Less effective in older ears 3. 10 msec. time delay : not good for a gun shot Digital Speech and Audio Processing Spring 2008-6

Ear Anatomy Inner ear The inner ear can be divided into three parts: the semicircular canals, the vestibule the cochlea The semicircular canals and the vestibule affect the sense of balance and are not concerned with hearing the cochlea, and what goes on inside it, provides the key to understanding many aspects of auditory perception Digital Speech and Audio Processing Spring 2008-7

Inner Ear : the Cochlea Cochlea: fluid filled. spiral in shape (snail shell) contains the most critical component of hearing : The organ of Corti, The Corti is contained in hairlined Basilar membrane Whenever it vibrates, the membrane converts mechanical signal into neural firing signals. Small sensory hair cells inside the Corti are bent, which stimulates the transmission of electrical impulses to the brain. Digital Speech and Audio Processing Spring 2008-8

OHC : tips embeded in tectorial membrane. IHC : tips just touch tectorial membrane OHC move with basilar membrane. IHC respond to velocity of BM. Scala media Scala vestibuli 30,000 hair cells arranged in rows along the length of the cochlea. Each group of [40-140] terminate on a auditory nerve Digital Speech and Audio Processing Spring 2008-9

Inner Ear : the Cochlea Base: Narrowest Stiffest High Freq. Apex: Widest Least stiff Low Freq. Digital Speech and Audio Processing Spring 2008-10

Inner Ear : the Cochlea The response of the BM to a sinusoidal input is like a traveling wave, starting from the base to the apex. There is no reflection at the apex (no standing wave phenomena). The wave decreases in speed as it travels. At any point on the BM, the motion is periodic with a period equals to that of the sound excitation. The traveling wave reaches a maximum amplitude at the point on the BM whose Characteristic or resonant frequency matches the input sound frequency. Thus each location vibrates sinusoidally in time, with a given phase delay. Digital Speech and Audio Processing Spring 2008-11

Inner Ear : the Cochlea Different frequencies excite different portions of the BM. Nerve fibers attached to BM have tuning curves, reflecting the nerve s characteristic Frequency (CF) CF is the frequency at which the BM location vibrates maximally for a given input sound, and thus the nerve fires the most Digital Speech and Audio Processing Spring 2008-12

Basilar membrane 25Hz 50Hz The Basilar membrane can thus be thought of as a bank of bandpass filters Each location has its characteristic frequency, and a constant Q (center F/BW) Freq. resolution best at low frequencies. 100Hz 200Hz 400Hz 800Hz 1600Hz Digital Speech and Audio Processing Spring 2008-13

Basilar membrane The tectorial membrane runs parallel to the basilar membrane, so when the basilar membrane vibrates up and down in response to motion at the stapes, so does the tectorial membrane The hair cell transforms the shearing (mechanical) forces into an electrical (neural) response, thru a series of electro-chemical reactions. Digital Speech and Audio Processing Spring 2008-14

Digital Speech and Audio Processing Spring 2008-15

Inner Ear : Basilar membrane Nerve fibers attached to BM have tuning curves, that plots, as function of tone frequency, the sound intensity necessary to raise its firing rate above the low spontaneous rate. The V-shape indicate a characteristic frequency at which a tone of minimal amplitude will raise the probability of this neuron to fire. The tuning curves resemble inverted forms of the bandpass filters. They are more sharp, due to some non-linear amplification effect. [From Evans] Digital Speech and Audio Processing Spring 2008-16

Firing Rates For a typical fiber, the plot of firing spikes per second vs. frequency look like inverted tuning curves for up to moderate sound amplitudes. At high intensity, the curve are trapezoidal, and max firing happens above and below the CF of the fiber. - recruitment effect: adjacent fibers start to fire as well. O Shaughnessy p117 Eventually, the firing rate saturates when the sounds intensity reaches the upper limit of the neuron (i.e. no longer increases). The range of intensities for which the firing rate varies with intensity is 20-40 db A small portions of fibers has up to 60 db range. Digital Speech and Audio Processing Spring 2008-17

Neuron Firing Timing of neural firings : In general, firings tend to be synchronized with the displacement of the BM. When the BM vibrates sinusoidally with sufficient magnitude, a nerve fiber tends to fire on synchronous half-cycles of the movements at the point where the fiber is attached. Neurons have a latency period of 1-3 msec on average in which once fired, a neuron cannot fire again, no matter how intense is the stimulus.» At low frequency neuron could fire every half cycle of a sinusoidal vibration to a tonal sound input.» At higher frequencies, the latency periods becomes greater than the period of the wave, although some adjacent neurons can fire synchronously.» Phase locking disappear after 4-5 khz Digital Speech and Audio Processing Spring 2008-18

Neuron Firing Timing of neural firings : Firing rates can reach 1000/s for short periods, and then go down exponentially with time. Most decay in first 15-20 msec of onset. When the sound is removed, the firing rate goes down to near zero, and then exponentially increases back to the so-called spontaneous rate, which is characteristic of that particular neuron.» Thus, our brain might interpret sounds based on changes in firing rates as well as a steady state rate.» Possibly 2 classes of neurons : steady state response, or changing frequency response. Digital Speech and Audio Processing Spring 2008-19

Sound Perception Ear s range of frequency roughly [20 Hz -> 20 khz] (some variations). Sounds below 1kHz and above 5 khz require more energy to be heard. Threshold of hearing or auditory threshold or Threshold in quiet: - min intensity at which sounds are perceived. Auditory threshold across speech frequencies : - (700 -> 7000 Hz) is constant to +/- 3dB. [From Winckel, manual of phonetics] Digital Speech and Audio Processing Spring 2008-20

Threshold of Hearing db The threshold of hearing is defined as the amount of energy needed for a pure tone to be detected/heard by a listener in a noiseless environment. 160 140 120 100 80 60 It can be approximated by the following fct (typically used in audio coding): T q 40 20 0 10 100 1000 10000 frequency (Hz) 0.8 2 f f 3 f ( f ) 3.64 6.5 exp 0.6 3.3 = + 10 1000 1000 1000 4 db Digital Speech and Audio Processing Spring 2008-21

Sound Perception Threshold is elevated by 10dB for F1 formant (<= 500 Hz) compared to the F2 / F3 region: For vowels : all harmonics are audible thru F4, but not at equal level. As speech amplitude is reduced gradually, the F0 and first few harmonics are lost perceptually, tho speech is still understood (not crucial to intelligibility). Speech energy at frequencies above 7 khz is present only for fricatives and have very little effect on intelligibility. The hearing threshold is typical for steady tones. for glides, threshold higher by 5 db for wideband noise, and short duration <0.3s), threshold typically up by 3dB Digital Speech and Audio Processing Spring 2008-22

Sound Perception - Definitions Amplitude Sound Pressure Level (SPL) L SPL = p 20log10 ( db) p 0 p sound pressure of stimulus in Pascals, p 0-20μPa Human hearing has a dynamic range of approximately 110dB Acoustic intensity : average flow of energy thru a unit area in watts/m 2. Audible range [ ] 12 2 10 :10 watts / m Intensity level : IL I = 10log10 ( db) I 0 I 12 0 = watts / 10 m 2 For a traveling wave, SPL and IL are equivalent because the minimum threshold I0 corresponds to the average pressure variation of p0, Digital Speech and Audio Processing Spring 2008-23

Loudness Examples Threshold of Hearing (TOH) 1*10-12 W/m 2 0 db Rustling Leaves 1*10-10 W/m 2 20 db Whisper 1*10-9 W/m 2 30 db Normal Conversation 1 foot 1*10-6 W/m 2 60 db Busy Street Traffic 1*10-5 W/m 2 70 db Vacuum Cleaner 1*10-4 W/m 2 80 db Large Orchestra 6.3*10-3 W/m 2 98 db Walkman at Maximum Level 1*10-2 W/m 2 100 db Front Rows of Rock Concert 1*10-1 W/m 2 110 db Threshold of Pain 1*10 1 W/m 2 120 db Military Jet Takeoff 1*10 2 W/m 2 140 db Instant Perforation of Eardrum 1*10 4 W/m 2 160 db Digital Speech and Audio Processing Spring 2008-24

Loudness Digital Speech and Audio Processing Spring 2008-25

How does human hearing fair? Frequency range Human 20-23,000 Dog 67-45,000 Cat 45-64,000 Mouse 1,000-91,000 Bat 2,000-110,000 whale 1,000-123,000 Effect of old age Digital Speech and Audio Processing Spring 2008-26

Loudness and Intensity All else being equal, the higher the intensity, the greater the loudness. Higher intensity, higher loudness Lower intensity, lower loudness Digital Speech and Audio Processing Spring 2008-27

Loudness One sound may be perceived as being louder than another even though they are of the same intensity The perceived loudness of a sound depends on frequency 125 Hz, 3000 Hz, 8000 Hz A sound s loudness has to do with how intense the sound must be relative to a 1kHz tone, to be heard as equally loud. Equal-loudness contours db SPL plotted as a function of frequency for which the listener perceives equal loudness Digital Speech and Audio Processing Spring 2008-28

Perception : Loudness Points along 1 curve are perceived as the same loudness. Digital Speech and Audio Processing Spring 2008-29

Measuring Loudness Listener is asked to adjust the level of a test pure tone (of variable frequency) to match the known level of a 1 khz pure tone (at a given level). -The loudness level (in phons) of the 1 khz tone is equal to its sound pressure level in db SPL Repeat for different frequencies of the test tone we get an equal loudness contour - labelled in units of phons If a given sound is perceived to be as loud as a 60 db sound at 1000 Hz, then it is said to have a loudness of 60 phons. 60 phons means "as loud as a 60 db, 1000 Hz tone e.g. 40-phon contour - find the level for each frequency to sound as loud as the 1 khz 40 db SPL sound. Digital Speech and Audio Processing Spring 2008-30

Contours are flatter at high loudness levels (till 1kHz ) less of a difference Measuring perceived Loudness loudness as a function of frequency rate of growth of loudness differs for tones of different frequencies Curves are constructed for single frequency sounds Digital Speech and Audio Processing Spring 2008-31

Fletcher-Munson diagram Digital Speech and Audio Processing Spring 2008-32

Same SPL different loudness Same amplitude, frequency increasing: Sound Pressure (dbspl) 50 100 400 1000 5000 10000 20000 Frequency (Hz) Digital Speech and Audio Processing Spring 2008-33

Loudness 10 db "rule " For the loudness of a particular sound is that the sound must be increased in intensity by a factor of ten for the sound to be perceived as twice as loud. E,g., it takes 10 violins to sound twice as loud as one violin. ( Note: This an approximate general statement : see graph below ) Likely due to some saturation effects of the nerve cells. Two signals differing by 10 db: (500 Hz sinusoids) Digital Speech and Audio Processing Spring 2008-34

Loudness Digital Speech and Audio Processing Spring 2008-35

Perception : Just Noticeable Differences (JND) Ask a group of listeners if 2 consecutive sounds are percetually similar. Gradually change one of them until 75% of the responses is different. This becomes the JND. JND exists along various dimensions or attributes : frequency, amplitude, bandwidth, etc Just noticeable differences measure the resolving power of the ear and the limit of audition. Frequency JND : Below 1kHz, 2 equally intense tones must differ by 1-3 Hz to be distinguished in frequency. At higher Freq, JND is progressively larger : at 8 khz, it is 100 Hz. Digital Speech and Audio Processing Spring 2008-36

JND in sound intensity JND for noise burst : 0.5 1 db JND for tones : varies with frequency and intensity ~ 0.3 db in optimum conditions > 1 db at low intensities. Digital Speech and Audio Processing Spring 2008-37

Pitch Perception harmonic complex sounds Pitch is a unitary percept: You hear one complex tone, not 6 If a listener is asked to match the pitch of the complex sound to the pitch of a pure tone, they will choose a pure tone around the fundamental frequency. Digital Speech and Audio Processing Spring 2008-38

Pitch Perception Two theories 1. Place Theory: Pitch determined by where hair cells on basiliar membrane are responding to sound (i.e. firing the most) 2. Timing or Volley Theory: frequency discrimination is not done based on basilar membrane resonance, but on the time-synchronous neural firing information Digital Speech and Audio Processing Spring 2008-39

Pitch of missing fundamental if you present the harmonics alone, you still hear the pitch of the fundamental, even if there is no energy there : Virtual pitch Residue pitch Digital Speech and Audio Processing Spring 2008-40

Pitch : place theory 1. Place Theory: Pitch determined by where hair cells on basiliar membrane are responding to sound. Some post-processing is done to extrapolate the pitch. Digital Speech and Audio Processing Spring 2008-41

Pitch : place theory However, the system isn t just taking the difference between harmonic frequencies, because shifting the harmonics, but keeping the difference the same, changes the pitch. Digital Speech and Audio Processing Spring 2008-42

Level (db) Pitch perception: Timing theory Timing Theory : Pitch discrimination is based on the firing times of the neurons Eg. When the BM vibrates sinusoidaly with sufficient magnitude, a nerve fiber tends to fire on synchronous half-cycles of the movements at the point where the fiber is attached. 100 - - - 400 500 600 700 800 900 Frequency (Hz) Digital Speech and Audio Processing Spring 2008-43

Pitch perception: Timing theory Pitch is perceived to be around 100 Hz, though there is no energy at 100 Hz. Complex stimulus created by adding together 700, 800, 900, and 1000 Hz sine waves, with the same starting phase Digital Speech and Audio Processing Spring 2008-44

A1 cos( 2π f1t) + A2 cos(2πf 2t) + A3 cos(2πf 2t) 2.5 2 1.5 1 0.5 0-0.5-1 -1.5-2 But what if we -2.5 change the phases A 2 f t 100 θ ) + A 120 πf t + θ ) + 140 160 60 80 cos( π 1 + 1 2 cos(2 2 2 A3 cos(2πf 2t + 3) 1 θ 2.5 2 1.5 1 0.5 0-0.5-1 -1.5-2 -2.5 A 2 f90 t θ100 ) + A110 πf 130 t + θ 140 ) + 40 50 60 70 80 120 cos( π 1 + 1 2 cos(2 2 2 A3 cos(2πf 2t + 3) 1 θ Digital Speech and Audio Processing Spring 2008-45

Pitch perception: Timing theory Timing Theory : Caveat : Timing theory holds up to a given frequency; At higher frequencies, the latency periods becomes greater than the period of the wave, although some adjacent neurons can fire synchronously. Caveat : Changes in the phases of the harmonics will affect the resulting timing synchronization of the firings, but don t seems to affect perceived pitch. Place Theory Caveat : can t explain high pitch resolution at low frequencies. Digital Speech and Audio Processing Spring 2008-46

Pitch perception conflicting cues Short clicks of alternating polarity every 5 msec Pulse rate of 200 Hz True F0 is 100 Hz (square wave) At the apex of the BM, vibrationg is at 100 Hz High frequency end will vibrate at the higher harmonics of the square wave, and likely translate into a pitch (fundamental) of 200 Hz. In general, if there are sufficient synchronous firings at the base, they dominate perception. At lower rates, the apex end resolves pulses in time : perceived pitch is likely pulse rate, not F0. Digital Speech and Audio Processing Spring 2008-47

Masking Masking refers to a process where the perception of one sound is obscured because of the presence of another: Simultaneous masking (frequency masking) : generally a sounds masks another one at higher frequency. Non-simultaneous masking (temporal masking) : sound delayed with respect to another can mask it. Masker: primary tone (masking tone) Masked (tone or narrowband noise). Spread of masking: Effect can extend to upper and lower frequency bands. Masking is due to the non-linearity of the auditory system : Response to a sum of sounds is not equal to the sum of the individual responses. Masking can be exploited (to our advantage) in applications such as speech & audio coding and noise reduction. Digital Speech and Audio Processing Spring 2008-48

Masking [From Dolby] In frequency masking, the presence of one sound (at 500 Hz) raises the threshold of hearing for the other (say at 700 Hz) Digital Speech and Audio Processing Spring 2008-49

Frequency masking Experiment:» Play 500 Hz tone (masking tone) at fixed level (60 db). Play test tone at a different level (e.g., 800 khz), and raise level until just distinguishable.» Vary the frequency of the test tone and plot the threshold when it becomes audible» Repeat for various frequencies of masking tones The result will be a collection of curves showing the frequency masking effect Digital Speech and Audio Processing Spring 2008-50

Example of masking Pure tone masker and masked. Tone masking another at a higher frequency Tone masking another at a lower frequency Masker Masker + Masked Masker Masker + Masked time Masked tone is gradually reduced in intensity every time, until it is no longer heard. Digital Speech and Audio Processing Spring 2008-51

Frequency masking Noise-Masking-Tone (NMT): SMR=4dB Tone-Masking-Noise (TMN): SMR=24dB Noise-Masking-Noise (NMN): SMR=26dB NMT Asymmetry TMN Due to the way we perceive tones : more sensitive to disturbances of a tone s features. Digital Speech and Audio Processing Spring 2008-52

Tone Masking Noise 80 70 Signal Signal Signal + Noise (SNR = 24 db) Noise Sound Pressure Level [db-spl] 60 50 40 30 20 10 0-10 Masking threshold 5000 10000 15000 Threshold of Hearing Frequency [Hz] -20-30 Digital Speech and Audio Processing Spring 2008-53

Noise Masking Tone [From Egan and Hake] Masking thresholds produced by a narrow band of noise (365-455 Hz). The elevation in hearing threshold is shown for a tone, as function of frequency. Note changing shape (spread of masking) with level Upward spread of masking : due to the asymetry of the tuning curves of the neurons : less steep on skirts on the LF side, thus more influenced by sounds below its CF. Digital Speech and Audio Processing Spring 2008-54

Critical Bands The human auditory system has a limited resolution, which is frequency dependent. The Cochlea can be viewed as a bank of overlapping BP filters The Freq response of these filters is related to the tuning curve of neurons The perceptually uniform measure of frequency can be expressed in terms of the width of the Critical Bands: BW ~ 100 Hz for < 500 Hz. Increase logarithmically with freq... > 1 khz About 25 BP filters can model basilar membrane. A perceptual scale can thus be developed Digital Speech and Audio Processing Spring 2008-55

Critical Bands 1 Bark = width of one critical band For frequency < 500 Hz, it converts to freq / 100 Bark (400 Hz = 4 Bark) For frequency > 500 Hz: 9 + 4 log2 25 freq 100 20 bark scale Barks 15 10 5 Map Freq F(Hz) into Z (bark): z( 0 f ) = 13tan 0 5 10 15 20 25 1 (0.00076 f ) + 3.5 tan f 7500 ( Bark) Digital Speech and Audio Processing Spring 2008-56 1 Freq (khz) 2

Critical Bands Digital Speech and Audio Processing Spring 2008-57

Critical Bands measuring experiments A band of noise kept at constant spectral level while its bandwidth (and energy) is increased is heard at a constant loudness until the critical BW is attained. Afterwards, noise loudness increases as neurons from adjacent bands start firing. The shapes of the critical band filters is determined in experiments using noise to mask tones. About 24 bandpass filters Digital Speech and Audio Processing Spring 2008-58

Frequency masking modelling Human auditory system modeled as a bank of overlapping bandpass filters corresponding to the critical bands The energy of a sinewave at 1kHz and amount captured by the various auditory filters. Digital Speech and Audio Processing Spring 2008 [From - Moore] 59

Temporal Masking Masker occurs before (forward) or after (backward) the masked signal Forward masking : If we hear a loud sound, then it stops, it takes a little while until we can hear a soft tone nearby. Significant up to 200 msec Likely due to neuron fatigue Amount of masking -> logarithmic function of time Digital Speech and Audio Processing Spring 2008-60

Temporal masking backward masking Forward masking Digital Speech and Audio Processing Spring 2008-61

Temporal masking Backward masking A short tone is not heard when followed immediately by sufficient noise (in the same band of the tone). Falls off faster in time than forward masking (only up to 20 msec). less understood ; due to system overload and occurs at higher level of auditory processing. Little effect on trained listeners Occurs contralaterally: signal and masker in # ears Digital Speech and Audio Processing Spring 2008-62

Temporal masking examples Forward masking i. play a masking tone and then a tone that is semitone down with a 100 ms delay in between. We can hear both tones even though the second tone is decreased in 3 db increments. ii. play the same two tones with a time delay of 10 ms. Masking occurs in this case. Backward masking : initial tone masked by the tone that follows. 1. a time delay is set to 100 ms. We hear the first tone throughout. 2. The time delay is then decreased, but still above the 10 ms range This is a grey area. Does masking occur? 3. The time delay is below 10 ms. Masking should occur. Digital Speech and Audio Processing Spring 2008-63

Masking due to a complex stimulus Upward spread of frequency masking : Energy in the lower band(s) can mask the harmonics or the fricative energy in the higher bands» Strong F1 energy can hinder perception of place of articulation that sometimes is identified by the high-frequency regions. Forward masking : In weak fricative sound following a vowel (VC), forward masking can attenuate the initial portion of the consonant sound. Backward masking : CV context : stop bursts are 30 db weaker than following vowels and occur within range of backward masking (short VOT in voiced stops); perception of the place of articulation may be compromised. Digital Speech and Audio Processing Spring 2008-64

Time-Frequency Masking Digital Speech and Audio Processing Spring 2008-65

Timing Temporal resolution 2 brief clicks need to be separated by > 2msec to be heard as 2 To identify the order of 2 signals, 17 msec is needed. To detect a temporal gap in a narrowband sound (like a tone) requires typically 22.5 msec at 200 Hz, and more time as frequency decreases. Perception To note the order in a set of sequential short sounds, each sound must be 125-200 msec long Short sounds near the TOH must exceed a certain intensity-time product to be perceived.» The value is roughly constant (at a given F), up to 200 msec Hearing system tends to tune its focus to the frequency range of max energy. Temporal perception seems influenced by necessity to change the focus (or not)» Successive short sounds are well resolved in time when they have energy in identical bands. Digital Speech and Audio Processing Spring 2008-66

JND in speech Formant Frequency JND: Steady state vowel sounds : JND for F1 is 2-5% (14 Hz) Normal speech vowels :» JND is 9-14% for formant trajectories» JND is 4-7% for simultaneous parallel movement of F1, F2 Formant Amplitude JND F1 : 1.5 db ; F2 : 3 db Individual harmonics in a vowel :» 2 db for those at F1 or F2» 13 db in spectral valleys : large likely due to masking effects Formant BW Generally poor : 20-40% in BW for F1 and F2 F0 Frequency Steady-state vowels : 0.5% or < 1 Hz Larger for High vowels (IY, UW) than lower (AE, AA) due to masking of the lower harmonics by the lower F1 Digital Speech and Audio Processing Spring 2008-67

JND in speech Phase sensitivity In general, we are insensitive to phase changes» Sound with varying phases of its frequencies : However, this auditory insensitivity is limited to small amount of phase distortion :» Randomizing the phases of the harmonics of voiced speech has a large perceptual effect (harsh sounding speech). Digital Speech and Audio Processing Spring 2008-68