Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

Similar documents
Linguistic Phonetics Fall 2005

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Psychoacoustical Models WS 2016/17

SUBJECT: Physics TEACHER: Mr. S. Campbell DATE: 15/1/2017 GRADE: DURATION: 1 wk GENERAL TOPIC: The Physics Of Hearing

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

HCS 7367 Speech Perception

Issues faced by people with a Sensorineural Hearing Loss

Hearing. Juan P Bello

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32

Comment by Delgutte and Anna. A. Dreyer (Eaton-Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston, MA)

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Topic 4. Pitch & Frequency

Hearing. and other senses

Topic 4. Pitch & Frequency. (Some slides are adapted from Zhiyao Duan s course slides on Computer Audition and Its Applications in Music)

Hearing Sound. The Human Auditory System. The Outer Ear. Music 170: The Ear

Music 170: The Ear. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) November 17, 2016

Digital Speech and Audio Processing Spring

Thresholds for different mammals

Auditory nerve. Amanda M. Lauer, Ph.D. Dept. of Otolaryngology-HNS

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

Representation of sound in the auditory nerve

Lecture 3: Perception

16.400/453J Human Factors Engineering /453. Audition. Prof. D. C. Chandra Lecture 14

Auditory Physiology Richard M. Costanzo, Ph.D.

The Structure and Function of the Auditory Nerve

HEARING AND PSYCHOACOUSTICS

Signals, systems, acoustics and the ear. Week 5. The peripheral auditory system: The ear as a signal processor

THE MECHANICS OF HEARING

Loudness. Loudness is not simply sound intensity!

Computational Perception /785. Auditory Scene Analysis

Chapter 11: Sound, The Auditory System, and Pitch Perception

Spectrograms (revisited)

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Sound Waves. Sensation and Perception. Sound Waves. Sound Waves. Sound Waves

Proceedings of Meetings on Acoustics

whether or not the fundamental is actually present.

Hearing. istockphoto/thinkstock

Technical Discussion HUSHCORE Acoustical Products & Systems

PHYS 1240 Sound and Music Professor John Price. Cell Phones off Laptops closed Clickers on Transporter energized

Outline. 4. The Ear and the Perception of Sound (Psychoacoustics) A.1 Outer Ear Amplifies Sound. Introduction

Systems Neuroscience Oct. 16, Auditory system. http:

HEARING. Structure and Function

Who are cochlear implants for?

Acoustics Research Institute

Sound and its characteristics. The decibel scale. Structure and function of the ear. Békésy s theory. Molecular basis of hair cell function.

SLHS 267 Spring 2014

Chapter 3. Sounds, Signals, and Studio Acoustics

An Auditory-Model-Based Electrical Stimulation Strategy Incorporating Tonal Information for Cochlear Implant

PSY 215 Lecture 10 Topic: Hearing Chapter 7, pages

How is the stimulus represented in the nervous system?

Speech Generation and Perception

SPHSC 462 HEARING DEVELOPMENT. Overview Review of Hearing Science Introduction

EEL 6586, Project - Hearing Aids algorithms

ID# Exam 2 PS 325, Fall 2003

HCS 7367 Speech Perception

Masker-signal relationships and sound level

Signals, systems, acoustics and the ear. Week 1. Laboratory session: Measuring thresholds

The Ear. The ear can be divided into three major parts: the outer ear, the middle ear and the inner ear.

Multimedia Systems 2011/2012

Temporal Adaptation. In a Silicon Auditory Nerve. John Lazzaro. CS Division UC Berkeley 571 Evans Hall Berkeley, CA

Hearing. PSYCHOLOGY (8th Edition, in Modules) David Myers. Module 14. Hearing. Hearing

L2: Speech production and perception Anatomy of the speech organs Models of speech production Anatomy of the ear Auditory psychophysics

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Auditory Scene Analysis

Week 2 Systems (& a bit more about db)

Auditory System. Barb Rohrer (SEI )

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

The mammalian cochlea possesses two classes of afferent neurons and two classes of efferent neurons.

Masked Perception Thresholds of Low Frequency Tones Under Background Noises and Their Estimation by Loudness Model

Discrete Signal Processing

What Is the Difference between db HL and db SPL?

Outline. The ear and perception of sound (Psychoacoustics) A.1 Outer Ear Amplifies Sound. Introduction

Sensation and Perception. A. Sensation: awareness of simple characteristics B. Perception: making complex interpretations

Proceedings of Meetings on Acoustics

Hearing: Physiology and Psychoacoustics

THE EAR Dr. Lily V. Hughes, Audiologist

The development of a modified spectral ripple test

SOLUTIONS Homework #3. Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03

Receptors / physiology

Hearing Lectures. Acoustics of Speech and Hearing. Subjective/Objective (recap) Loudness Overview. Sinusoids through ear. Facts about Loudness

BASIC NOTIONS OF HEARING AND

Binaural Hearing. Steve Colburn Boston University

Required Slide. Session Objectives

Lecture 4: Auditory Perception. Why study perception?

Reference: Mark S. Sanders and Ernest J. McCormick. Human Factors Engineering and Design. McGRAW-HILL, 7 TH Edition. NOISE

Stimulus Coding in the Auditory Nerve. Neural Coding and Perception of Sound 1

Chapter 1: Introduction to digital audio

COM3502/4502/6502 SPEECH PROCESSING

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Chapter 13 Physics of the Ear and Hearing

Deafness and hearing impairment

Contents. Exercises and Questions. Answers

9/29/14. Amanda M. Lauer, Dept. of Otolaryngology- HNS. From Signal Detection Theory and Psychophysics, Green & Swets (1966)

Role of F0 differences in source segregation

Auditory scene analysis in humans: Implications for computational implementations.

ClaroTM Digital Perception ProcessingTM

Transcription:

24.963 Linguistic Phonetics Basic Audition Diagram of the inner ear removed due to copyright restrictions. 1

Reading: Keating 1985 24.963 also read Flemming 2001 Assignment 1 - basic acoustics. Due 9/22. 2

Audition Outer Ear Middle Ear Inner Ear Ear Flap Hammer Anvil Auditory Nerve Ear Canal Eardrum Cochlea Stirrup Eustachian Tube Anatomy 3

Auditory spectrograms The auditory system performs a running frequency analysis of acoustic signals - cf. spectrogram. But the auditory spectrogram differs from a regular spectrogram in significant ways, e.g.: Frequency - a regular spectrogram analyzes frequency bands of equal widths, but the peripheral auditory system analyzes frequency bands that are wider at higher frequencies. Loudness is non-linearly related to intensity Masking effects (simultaneous and non-simultaneous). It is useful to bear these differences in mind when looking at acoustic spectrograms. It is possible to generate auditory spectrograms based on models of the auditory system. 4

Loudness The perceived loudness of a sound depends on the amplitude of the pressure fluctuations in the sound wave. Amplitude is usually measured in terms of rootmean-square (rms amplitude): The square root of the mean of the squared amplitude over some time window. 5

rms amplitude Square each sample in the analysis window. Calculate the mean value of the squared waveform: Sum the values of the samples and divide by the number of samples. Take the square root of the mean. 1.5 1 0.5 pressure 0 0 0.05 0.1 0.15 0.2 pressure pressure^2 rms amplitude -0.5-1 -1.5 time 6

rms amplitude 0.01.02 Time in seconds Adapted from Johnson, Keith. Acoustic and Auditory Phonetics. Malden, MA: Blackwell Publishers, 1997. ISBN: 9780631188483. 7

Intensity Perceived loudness is more closely related to intensity (power per unit area), which is proportional to the square of the amplitude. relative intensity in Bels = log 10 (x 2 /r 2 ) relative intensity in db = 10 log 2 10(x /r 2 ) = 20 log 10 (x/r) In absolute intensity measurements, the comparison amplitude is usually 20µPa, the lowest audible pressure fluctuation of a 1000 Hz tone (db SPL). 8

log xy = log x + log y logarithmic scales log(x) 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 10 20 30 40 50 x 9

Loudness The relationship between intensity and perceived loudness of a pure tone is not exactly logarithmic. Loudness of a pure tone (> 40 db) in Sones: N = 2 ( db 40) 10 Loudness is defined to be 1 sone for a 1000 Hz tone at 40 db SPL Sones 20 18 16 14 12 10 8 6 db SPL Sones 100 90 80 70 60 50 40 30 db SPL 4 20 2 10 0 0 0 500,000 1,000,000 1,500,000 2,000,000 Pressure (µpa) Adapted from Johnson, Keith. Acoustic and Auditory Phonetics. Malden, MA: Blackwell Publishers, 1997. ISBN: 9780631188483. 10

Pure tones: 131-2092 Hz Which sounds loudest? Loudness 11

Loudness Loudness also depends on frequency. equal loudness contours for pure tones: Springer. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/. 12

Loudness At short durations, loudness also depends on duration. Temporal integration: loudness depends on energy in the signal, integrated over a time window. Duration of integration is often said to be about 200ms, i.e. relevant to the perceived loudness of vowels. 13

Pitch Perceived pitch is approximately linear with respect to frequency from 100-1000 Hz, between 1000-10,000 Hz the relationship is approximately logarithmic. 24 20 Frequency (Bark) 16 12 8 4 0 0 1 2 3 4 5 6 7 8 9 10 Frequency (khz) Adapted from Johnson, Keith. Acoustic and Auditory Phonetics. Malden, MA: Blackwell Publishers, 1997. ISBN: 9780631188483. 14

Pitch The non-linear frequency response of the auditory system is related to the physical structure of the basilar membrane. basilar membrane uncoiled : Diagram of the inner ear removed due to copyright restrictions. 15,400 8,700 4,900 2,700 1,500 750 300 11,500 6,500 3,700 2,000 1,100 500 100 15

Frequency resolution In the same way, the structure of the basilar membrane affects the frequency resolution of the auditory spectrogram. An acoustic spectrogram represents the variation in intensity over time in a set of frequency bands. E.g. a standard broad-band spectrogram might use frequency bands of 0-200 Hz, 200-400 Hz, 400-600 Hz, etc. The ear represents loudness in frequency bands that are narrower at low frequencies and wider at high frequencies.?-100hz, 100-200 Hz,, 400-510 Hz, 1080-1270 Hz, 12000-15500. These critical bands correspond to a length of about 1.3mm along the basilar membrane (Fastl & Zwicker 2007: 162) & Zwicker 2007: 162) 15,400 8,700 4,900 2,700 1,500 750 300 11,500 6,500 3,700 2,000 1,100 500 100 16

Auditory spectra 85 80 75 Vowel /I/ Level, db 70 65 60 55 50 45 40 0 500 1,000 1,500 2,000 2,500 Frequency, Hz 3,000 3,500 4,000 Adapted from Moore, Brian. The Handbook of Phonetic Science. Edited by William J. Hardcastle and John Laver. Malden, MA: Blackwell, 1997. ISBN: 9780631188483. 80 70 80 Vowel /I/ 60 Excitation Level, db 50 40 30 50 20 10 2 4 6 8 10 12 14 16 18 Number of ERBs, E 20 22 24 26 The spectrum of a synthetic vowel /I/ (top) plotted on a linear frequency scale, and the excitation patterns for that vowel (bottom) for two overall levels, 50 and 80 db. The excitation patterns are plotted on an ERB scale. Adapted from Moore, Brian. The Handbook of Phonetic Science. Edited by William J. Hardcastle and John Laver. Malden, MA: Blackwell, 1997. ISBN: 9780631188483. 17

Italian vowels F2 (Hz) 2500 2300 2100 1900 1700 1500 1300 1100 900 700 500 i e o u 200 400 600 F1 (Hz) a 800 ERB scales F2 (E) 25 23 21 19 17 15 6 8 10 12 F1(E) E(F1) 14 16 18

Wiley-Blackwell. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/. Source: Johnson, K. (2011) "Acoustic and Auditory Phonetics, 3rd ed. Wiley-Blackwell, Oxford. 19

Masking - simultaneous Energy at one frequency can reduce audibility of simultaneous energy at another frequency (masking). Single tone, followed by the same tone and a higherfrequency tone, repeated with progressive reduction in the intensity of the higher tone For how many steps can you hear two tones? Same pattern, same amplitude tones, but greater difference in the frequencies of the tones For how many steps can you hear two tones? 20

Masking - simultaneous Energy at one frequency can reduce audibility of simultaneous energy at another frequency (masking). 10 4 10 3 Masking 10 2 10 1 400 600 800 1000 1200 1600 2000 2400 2600 3200 3600 4000 Frequency of masked tone Stevens (1999) Example of masking of a tone by a tone. The frequency of the masking tone is 1200 Hz. Each curve corresponds to a different masker level, and gives the amount by which the threshold intensity of the masked tone is multiplied in the presence of the masker, relative to its threshold in quiet. The dashed lines near 1200 Hz and its harmonics are estimates of the masking functions in the absence of the effect of beats. Adapted from Stevens, Kenneth N. Acoustic Phonetics. Cambridge, MA: MIT Press, 1999. ISBN; 9780262194044. 21

Time course of auditory nerve response Response to a noise burst: Strong initial response Rapid adaptation (~5 ms) Slow adaptation (>100ms) After tone offset, firing rate only gradually returns to spontaneous level. 256-60 128 0 256-40 128 0 0 64 128 msec Adapted from Kiang et al. (1965) Kiang et al (1965) 22

Interactions between sequential sounds A preceding sound can affect the auditory nerve response to a following tone (Delgutte 1980). Discharge rate (SP/S) 600 400 200 NO AT TT = 27 db SPL AT = 13 db SPL AT = 37 db SPL 0 0 200 400 0 200 400 0 200 400 M M M AT TT Adapted from Stevens, Kenneth N. Acoustic Phonetics. Cambridge, MA: MIT Press, 1999. ISBN; 9780262194044, after Delgutte, B. "Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers." Journal of the Acoustical Society of America 68, no. 3 (1980): 843-857. 23

Courtesy of the MIT Press from Schnupp, Jan, Israel Nelken, and Andrew King. Auditory neuroscience: Making sense of sound. MIT press, 2011. Used with permission. Source: Schnupp, Nelken & King (2011) "Auditory Neuroscience: Making Sense of Sound. MIT Press. 24

24.963 Linguistic Phonetics Analog-to-digital conversion of speech signals - Air pressure + 0.01.02.03 s Time in seconds 25

Analog-to-digital conversion Almost all acoustic analysis is now computer-based. Sound waves are analog (or continuous) signals, but digital computers require a digital representation - i.e. a series of numbers, each with a finite number of digits. There are two continuous scales that must be divided into discrete steps in analog-to-digital conversion of speech: time and pressure (or voltage). Dividing time into discrete chunks is called sampling. Dividing the amplitude scale into discrete steps is called quantization. - Air pressure + 0.01.02.03 s Time in seconds 26

Sampling The amplitude of the analog signal is sampled at regular intervals. The sampling rate is measured in Hz (samples per second). The higher the sampling rate, the more accurate the digital representation will be. (a) (b) 0 Time.01.02.03 Sec A wave with a fundamental frequency of 100 Hz and a major component at 300Hz sampled at 1500Hz. 0 Time.01.02.03 The same wave sampled at 600 Hz. Sec (c) 0.01.02.03 Time Sec Adapted from Ladefoged, Peter. L104/204 Phonetic Theory lecture notes, University of California, Los Angeles. 27

Sampling In order to represent a wave component of a given frequency, it is necessary to sample the signal with at least twice that frequency (the Nyquist Theorem). The highest frequency that can be represented at a given sampling rate is called the Nyquist frequency. (a) (b) 0 Time.01.02.03 Sec A wave with a fundamental frequency of 100 Hz and a major component at 300Hz sampled at 1500Hz. The wave at right has a significant harmonic at 300 Hz (a) sampling rate 1500 Hz (b) sampling rate 600 Hz (c) sampling rate 500 Hz (c) 0 Time.01.02.03 The same wave sampled at 600 Hz. Sec 0.01.02.03 Time Sec Adapted from Ladefoged, Peter. L104/204 Phonetic Theory lecture notes, University of California, Los Angeles. 28

What sampling rate should you use? The highest frequency that (young, undamaged) ears can perceive is about 20 khz, so to ensure that all audible frequencies are represented we must sample at 2 20 khz = 40 khz. The ear is relatively insensitive to frequencies above 10 khz, and almost all of the information relevant to speech sounds is below 10 khz, so high quality sound is still obtained at a sampling rate of 20 khz. There is a practical trade-off between fidelity of the signal and memory, but memory is getting cheaper all the time. 29

What sampling rate should you use? For some purposes (e.g. measuring vowel formants), a high sampling rate can be a liability, but it is always possible to downsample before performing an analysis. Audio CD uses a sampling rate of 44.1 khz. Many A-to-D systems only operate at fractions of this rate (44100 Hz, 22050 Hz, 11025 Hz). For most purposes, use a sampling rate of 44.1 khz. 30

Aliasing Components of a signal which are above the Nyquist frequency are misrepresented as lower frequency components (aliasing). To avoid aliasing, a signal must be filtered to eliminate frequencies above the Nyquist frequency. Since practical filters are not infinitely sharp, this will attenuate energy near to the Nyquist frequency also. Amplitude 0 2 4 6 Time (ms) 8 10 Adapted from Johnson, Keith. Acoustic and Auditory Phonetics. Malden, MA: Blackwell Publishers, 1997. ISBN: 9780631188483. 31

Quantization The amplitude of the signal at each sampling point must be specified digitally - quantization. Divide the continuous amplitude scale into a finite number of steps. The more levels we use, the more accurately we approximate the analog signal. 20 steps Amplitude 200 steps 0 5 10 Time (ms) 15 20 Adapted from Johnson, Keith. Acoustic and Auditory Phonetics. Malden, MA: Blackwell Publishers, 1997. ISBN: 9780631188483. 32

Quantization The number of levels is specified in terms of the number of bits used to encode the amplitude at each sample. Using n bits we can distinguish 2 n levels of amplitude. e.g. 8 bits, 256 levels. 16 bits, 65536 levels. Now that memory is cheap, speech is almost always digitized at 16 bits (the CD standard). 33

Quantization Quantizing an analog signal necessarily introduces quantization errors. If the signal level is lower, the degradation in signal-tonoise ratio introduced by quantization noise will be greater, so digitize recordings at as high a level as possible without exceeding the maximum amplitude that can be represented (clipping). On the other hand, it is essential to avoid clipping. Amplitude 0 5 10 15 20 25 Time (ms) Adapted from Johnson, Keith. Acoustic and Auditory Phonetics. Malden, MA: Blackwell Publishers, 1997. ISBN: 9780631188483. 34

MIT OpenCourseWare https://ocw.mit.edu 24.915 / 24.963 Linguistic Phonetics Fall 2015 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.