Auditory scene analysis in humans: Implications for computational implementations.

Similar documents
Auditory Scene Analysis

Computational Perception /785. Auditory Scene Analysis

= + Auditory Scene Analysis. Week 9. The End. The auditory scene. The auditory scene. Otherwise known as

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080

Auditory Scene Analysis. Dr. Maria Chait, UCL Ear Institute

Hearing in the Environment

Hearing II Perceptual Aspects

AUDITORY GROUPING AND ATTENTION TO SPEECH

Auditory Scene Analysis: phenomena, theories and computational models

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

Auditory gist perception and attention

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 5, SEPTEMBER A Computational Model of Auditory Selective Attention

group by pitch: similar frequencies tend to be grouped together - attributed to a common source.

Role of F0 differences in source segregation

On the influence of interaural differences on onset detection in auditory object formation. 1 Introduction

Linguistic Phonetics Fall 2005

Topic 4. Pitch & Frequency

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

Toward an objective measure for a stream segregation task

Topic 4. Pitch & Frequency. (Some slides are adapted from Zhiyao Duan s course slides on Computer Audition and Its Applications in Music)

3-D Sound and Spatial Audio. What do these terms mean?

Hearing the Universal Language: Music and Cochlear Implants

Hearing. Juan P Bello

Processing Interaural Cues in Sound Segregation by Young and Middle-Aged Brains DOI: /jaaa

Robust Neural Encoding of Speech in Human Auditory Cortex

Combating the Reverberation Problem

Infant Hearing Development: Translating Research Findings into Clinical Practice. Auditory Development. Overview

HCS 7367 Speech Perception

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

Auditory Perception: Sense of Sound /785 Spring 2017

Binaural interference and auditory grouping

HCS 7367 Speech Perception

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Even though a large body of work exists on the detrimental effects. The Effect of Hearing Loss on Identification of Asynchronous Double Vowels

J Jeffress model, 3, 66ff

Congruency Effects with Dynamic Auditory Stimuli: Design Implications

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Sound Waves. Sensation and Perception. Sound Waves. Sound Waves. Sound Waves

Issues faced by people with a Sensorineural Hearing Loss

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches

Temporal offset judgments for concurrent vowels by young, middle-aged, and older adults

Technical Discussion HUSHCORE Acoustical Products & Systems

Auditory System & Hearing

Effects of time intervals and tone durations on auditory stream segregation

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Spectral processing of two concurrent harmonic complexes

HEARING. Structure and Function

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

Binaural Hearing. Why two ears? Definitions

Sounds produced by different sources sum in the air before

Modulation and Top-Down Processing in Audition

Lecture 3: Perception

What Is the Difference between db HL and db SPL?

ID# Exam 2 PS 325, Fall 2009

PERCEPTION OF UNATTENDED SPEECH. University of Sussex Falmer, Brighton, BN1 9QG, UK

Fusion of simultaneous tonal glides: The role of parallelness and simple frequency relations

INTRODUCTION J. Acoust. Soc. Am. 100 (4), Pt. 1, October /96/100(4)/2352/13/$ Acoustical Society of America 2352

ELL 788 Computational Perception & Cognition July November 2015

Running head: HEARING-AIDS INDUCE PLASTICITY IN THE AUDITORY SYSTEM 1

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

11 Music and Speech Perception

Audio Quality Assessment

Automatic audio analysis for content description & indexing

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

Use of Auditory Techniques Checklists As Formative Tools: from Practicum to Student Teaching

Systems Neuroscience Oct. 16, Auditory system. http:

HEARING AND PSYCHOACOUSTICS

Speech (Sound) Processing

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

JARO. Research Article. Abnormal Binaural Spectral Integration in Cochlear Implant Users

Using 3d sound to track one of two non-vocal alarms. IMASSA BP Brétigny sur Orge Cedex France.

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

Influence of acoustic complexity on spatial release from masking and lateralization

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

Proceedings of Meetings on Acoustics

Auditory Physiology PSY 310 Greg Francis. Lecture 30. Organ of Corti

Auditory nerve. Amanda M. Lauer, Ph.D. Dept. of Otolaryngology-HNS

SPHSC 462 HEARING DEVELOPMENT. Overview Review of Hearing Science Introduction

21/01/2013. Binaural Phenomena. Aim. To understand binaural hearing Objectives. Understand the cues used to determine the location of a sound source

Hearing. and other senses

Who are cochlear implants for?

Study of perceptual balance for binaural dichotic presentation

Psychoacoustical Models WS 2016/17

Rhythm Categorization in Context. Edward W. Large

COGS 101A: Sensation and Perception

Overview. Each session is presented with: Text and graphics to summarize the main points Narration, video and audio Reference documents Quiz

Sonic Spotlight. Binaural Coordination: Making the Connection

CS/NEUR125 Brains, Minds, and Machines. Due: Friday, April 14

Hearing. Figure 1. The human ear (from Kessel and Kardon, 1979)

The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet

Many species, including birds (1), frogs (2), and mammals

Auditory System & Hearing

BCS 221: Auditory Perception BCS 521 & PSY 221

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Consonant Perception test

BINAURAL DICHOTIC PRESENTATION FOR MODERATE BILATERAL SENSORINEURAL HEARING-IMPAIRED

Separate What and Where Decision Mechanisms In Processing a Dichotic Tonal Sequence

Transcription:

Auditory scene analysis in humans: Implications for computational implementations. Albert S. Bregman McGill University Introduction. The scene analysis problem. Two dimensions of grouping. Recognition errors if auditory scene analysis is done wrong. Speech perception suggests a strong role for schemas in auditory scene analysis.

Sequential organization Cues that favor sequential grouping. Idea that there is a competitive grouping of sounds according to the "distance" (D) between sounds, where D is some combination of distances along different dimensions. Frequency & temporal proximity Demo 3 from Bregman & Ahad disk: Galloping rhythm Which feature of the temporal interval is important? The interval from the end of one sound to the beginning of the next sound with which it might be grouped. Acts the temporal separation component of D. Rapid sequences enhance streaming because the size of the contribution of temporal distance to D is reduced relative to the contribution of other distances. Demo: Tone duration. Increased segregation as alternating high and low tones get longer (increased duration) while keeping the stimulus onset interval (tempo) fixed.

Fundamental frequency and spectral frequency. These are independent features of complex tones and each contributes independently to D. Spatial separation - either real, or signaled over headphones by time delays or amplitude difference - contributes to D. Differences among sounds in their spectral peaks contribute to D. Loudness differences make a weak contribution to D. Abruptness of property change in the acoustic sequence makes an independent contribution to segregation. Smoothly changing properties are marks of a single, changing sound.

Effects of sequential organization on perception Perception of melody: the melody is formed within an auditory stream, Rhythm emerges primarily within segregated streams; Judgments of timing are more precise within a perceived stream than across streams; Continuity of synthetic speech is lost if you suddenly change the pitch of the voice (Darwin & Bethell-Fox, 1977) Perceived spatial location can be affected by its sequential grouping. A sound that is occurring at one ear can be treated as a continuation of a previous monaural sound and prevented from combining with a contralateral sound of the same frequency and phase to create a spatially centered image (Research in progress with Y. Tagami and M. Rapaport). Perceived loudness can be affected if a set of components from a current spectrum is treated as merely a continuation of a previous sound (Richard Warren s "homophonic continuity" paradigm).

Simultaneous grouping (grouping of components that overlap in time): Again we can conceive of a variable D (difference), but this time it applies to subsets of spectral bands or (spectral features)which can be segregated from one another. Factors that contribute to D (increase segregation). Different subsets of spectral features (e.g., spectral components): Belong to different harmonic series; Come from different places in space; Occupy distinct frequency regions; Show independent changes in loudness (synchronized dynamics); Are asynchronous in onset (and to a lesser degree, in offset); Undergo asynchronous changes in frequency (Maybe. There is still disagreement about this).

Effects of having grouped parts of the spectrum into separate sounds. a) Each global sound has its own properties, including: Pitch; timbre; loudness; perceived location in space. b) Organizing parts of the spectrum into separate groups reduces their perceptual interaction.

Competition between sequential and simultaneous grouping: a) the old-plus-new strategy. "When a spectrum gets more complex, try to interpret it as a continuing (old) spectrum to which a new spectrum has added some additional components." This is the most powerful cue for segregating concurrent sounds. Chris Darwin at Sussex University has shown that it can even segregate a harmonic from a vowel, changing the identity of the vowel. Demo 34 from Bregman & Ahad disk: Capturing a narrower band of noise out of a wider frequency band by alternating the narrower and wider bands The old-plus-new phenomenon be viewed as a more pronounced case of asynchrony of onset.

Achievement of stability in the face of fluctuating acoustic evidence: by weighing different types of evidence against one another (as if the clues voted); by hanging on to an existing interpretation as long as possible.

Issues in the implementation of a system for computational auditory scene analysis (CASA). Lessons from research on speech perception: Formants presented to opposite ears can be perceptually integrated to yield a speech sound, despite being also heard as separate sounds The same holds true for "sine-wave speech" (SWS) tones. Demo 23 from Bregman & Ahad disk: "SWS" Vowels synthesized on the same fundamental and then mixed can in some cases be identified even though there is no acoustic cue to put together the right combinations of formants into two vowels. These facts seem to negate the importance of auditory scene analysis in speech perception.

I would like to propose a set of principles for auditory scene analysis to handle these facts and to make predictions for non-speech signals 1. The default status of an input auditory array is the integration of all of it into a single sound. Unless there is evidence to partition the sound, it remains integrated. This explains why SWS is moderately intelligible, even if the component tones are divided between the ears. There is no competition. Grouping all of them can give the correct phonetic percept. We are currently doing research on mixed SWS signals, where grouping everything can t be effective and auditory scene analysis principles are very important. 2. Schema-based processes sometimes can select from the signal the components they need to satisfy their data requirements. The action of such schemas explains the perception of SWS and also how certain pairs of vowels, such as /ah/ and /ee/, that have been synthesized on the same fundamental and then mixed, can nonetheless be perceived separately. 3. Bottom-up ASA principles do not partition the signal into airtight packages and then hand them on to speech recognizers. Instead, they establish constraints that interact with constraints from top-down processes to converge on a description of sources that maximally satisfies both.

This description calls into question the current goal of most CASA research: to separate the acoustic data into distinct packages and hand each package separately to a speechrecognition program for recognition. The fact that concurrent perception of both speech and non speech sounds can occur, as when formants are divided between the ears or when presented as tones in SWS, suggests: Allocation of input components to perceived sounds is not all-or-nothing. The same acoustic component or feature can contribute to the perception of both a speech and a non-speech sound. This seems to be true of sounds other than speech. E.g., an unpublished experiment of Martine Turgeon, using the "rhythmic masking release" paradigm: a tone had its own rhythm, and at the same time was integrated with others to create another rhythm. Intelligibility of SWS suggests that one perceptual option, when bottom-up processes favor segregation, but not very strongly, is to allow integration to occur while some perception of the components is also available (duplex perception).

Lessons from the fact that cues seem to add up in determining perceptual grouping. Models should be designed so that there is a parallel activity of the systems that deal with different cues. In natural environments, sometimes a particular cue is useful and sometimes not. The most useful cues in any environment should be able to take over the burden of segregation and grouping in a seamless way. In a computational model, each cue-using mechanism should be able to be shut off without affecting the activity of the remaining ones (the system should degrade gracefully). Typically a CASA model attempts to test out the use of some particular cue for grouping. The modeler should be able to compare the success rates with and without the mechanism that uses this principle, so as to determine its incremental utility. This should be compared under different types of degradation of the signal. It may be that the cue in question has no incremental utility and also degrades just as quickly or more quickly than other cues in the same sets of environments. Perhaps the most powerful cue to segregation of simultaneous events is the old-plus-new heuristic. Models should make more use of it.