Auditory Scene Analysis in Humans and Machines

Size: px
Start display at page:

Download "Auditory Scene Analysis in Humans and Machines"

Transcription

1 Auditory Scene Analysis in Humans and Machines Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA 1. The ASA Problem 2. Human ASA 3. Machine Source Separation 4. Systems & Examples 5. Concluding Remarks Auditory Scene Analysis - Dan Ellis /54

2 1. Auditory Scene Analysis Sounds rarely occurs in isolation.. but recognizing sources in mixtures is a problem.. for humans and machines chan 0 freq / khz 4 2 mr :57:43 chan 1 chan 3 freq / khz freq / khz level / db mr ce+1743.wav chan 9 freq / khz time / sec Auditory Scene Analysis - Dan Ellis /54

3 Sound Mixture Organization Goal: recover individual sources from scenes.. duplicating the perceptual effect 4000 frq/hz time/s level / db Analysis Voice (evil) Rumble Stab Voice (pleasant) Strings Choir Problems: competing sources, channel effects Dimensionality loss need additional constraints Comp. Aud. Scene Analysis - Dan Ellis /21

4 The Problem of Mixtures Imagine two narrow channels dug up from the edge of a lake, with handkerchiefs stretched across each one. Looking only at the motion of the handkerchiefs, you are to answer questions such as: How many boats are there on the lake and where are they? (after Bregman 90) Received waveform is a mixture 2 sensors, N sources - underconstrained Undoing mixtures: hearing s primary goal?.. by any means available Comp. Aud. Scene Analysis - Dan Ellis /21

5 Source Separation Scenarios Interactive voice systems human-level understanding is expected Speech prostheses crowds: #1 complaint of hearing aid users Archive analysis identifying and isolating sound events freq / khz dpwe :15: level / db time / sec Unmixing/remixing/enhancement pa wav Auditory Scene Analysis - Dan Ellis /54

6 How Can We Separate? By between-sensor differences (spatial cues) steer a null onto a compact interfering source By finding a separable representation spectral? sources are broadband but sparse periodicity? maybe for pitched sounds something more signal-specific... By inference (based on knowledge/models) acoustic sources are redundant use part to guess the remainder Auditory Scene Analysis - Dan Ellis /54

7 Outline 1. The ASA Problem 2. Human ASA scene analysis separation by location separation by source characteristics 3. Machine Source Separation 4. Systems & Examples 5. Concluding Remarks Auditory Scene Analysis - Dan Ellis /54

8 Auditory Scene Analysis Listeners organize sound mixtures into discrete perceived sources based on within-signal cues (audio +...) common onset + continuity harmonicity freq / khz spatial, modulation,... learned schema time / sec level / db Reynolds-McAdams oboe reynolds-mcadams-dpwe.wav Auditory Scene Analysis - Dan Ellis /54

9 Perceiving Sources Harmonics distinct in ear, but perceived as one source ( fused ): %req!ime depends on common onset depends on harmonics Experimental techniques ask subjects how many match attributes e.g. pitch, vowel identity brain recordings (EEG mismatch negativity ) Auditory Scene Analysis - Dan Ellis /54

10 Auditory Scene Analysis How do people analyze sound mixtures? break mixture into small elements (in time-freq) elements are grouped in to sources using cues sources have aggregate attributes Grouping rules (Darwin, Carlyon,...): cues: common onset/offset/modulation, harmonicity, spatial location,... Bregman 90 Darwin & Carlyon 95 Onset map Elements Sources Sound Frequency analysis Harmonicity map Grouping mechanism Source properties Spatial map (after Darwin 1996) Comp. Aud. Scene Analysis - Dan Ellis /21

11 Streaming Sound event sequences are organized into streams i.e. distinct perceived sources difficult to make comparisons between streams Two-tone streaming experiments: TRT: ms frequency 1 khz!f: 2 octaves asacd-36-streaming-50s ecological relevance? time Auditory Scene Analysis - Dan Ellis /54

12 Illusions & Restoration Illusion = hearing more than is there e.g. pulsation threshold example - tone is masked freq? time asacd-26-pulsation old-plus-new heuristic: existing sources continue frequency & /0z time & s Need to infer most likely real-world events observation equally good match to either case prior likelihood of continuity much higher bregnoise Auditory Scene Analysis - Dan Ellis /54

13 Human Performance: Spatial Separation Task: Coordinate Response Measure Ready Baron go to green eight now 256 variants, 16 speakers correct = color and number for Baron Accuracy as a function of spatial separation: Brungart et al. 02 crm wav A, B same speaker o Range effect Auditory Scene Analysis - Dan Ellis /54

14 Separation by Vocal Differences CRM varying the level and voice character (same spatial location) Brungart et al. 01 energetic vs. informational masking Auditory Scene Analysis - Dan Ellis /54

15 Varying the Number of Voices Two voices OK; More than two voices harder (same spatial origin) Brungart et al. 01 mix of N voices tends to speech-shaped noise... Auditory Scene Analysis - Dan Ellis /54

16 Outline 1. The ASA Problem 2. Human ASA 3. Machine Source Separation Independent Component Analysis Computational Auditory Scene Analysis Model-Based Separation 4. Systems & Examples 5. Concluding Remarks Auditory Scene Analysis - Dan Ellis /54

17 Scene Analysis Systems Scene Analysis not necessarily separation, recognition,... scene = overlapping objects, ambiguity General Framework: distinguish input and output representations distinguish engine (algorithm) and control (constraints, computational model ) Comp. Aud. Scene Analysis - Dan Ellis /21

18 Human and Machine Scene Analysis CASA (e.g. Brown 92): Input: Periodicity, continuity, onset maps Output: Waveform (or mask) Engine: Time-frequency masking Control: Grouping cues from input - or: spatial features (Roman,...) Comp. Aud. Scene Analysis - Dan Ellis /21

19 Human and Machine Scene Analysis CASA (e.g. Brown 92): ICA (Bell & Sejnowski et seq.): Input: waveform (or STFT) Output: waveform (or STFT) Engine: cancellation Control: statistical independence of outputs - or energy minimization for beamforming Comp. Aud. Scene Analysis - Dan Ellis /21

20 Human and Machine Scene Analysis CASA (e.g. Brown 92): ICA (Bell & Sejnowski et seq.): Human Listeners: Input: excitation patterns... Output: percepts... Engine:? Control: find a plausible explanation Comp. Aud. Scene Analysis - Dan Ellis /21

21 Machine Separation Problem: Features of combinations are not combinations of features voice is easy to characterize when in isolation redundancy needed for real-world communication!"l" $"%&e ()* $"%&e (%+ freq / 9Hz freq / 9Hz 4 / / 2 1 5r%6%27l (r* /0. (r* /0.=16-0>010> 0 0 0?> 1 1?> t2*e / se( (*,, -"%.e re.1234 (r* /0.-n12se (r* /0.=16-0>010>-n12se 0 0?> 1 1?> t2*e / se( crm wav -60 level / db Auditory Scene Analysis - Dan Ellis / crm noise.wav crm wav crm noise.wav

22 Separation Approaches ICA Multi-channel Fixed filtering Perfect separation maybe! CASA / Model-based Single-channel Time-varying filtering Approximate Separation target x interference n mix proc + - n^ x^ mix x+n stft proc spectro gram mask x istft x^ Very different approaches! Auditory Scene Analysis - Dan Ellis /54

23 Independent Component Analysis Central idea: Search unmixing space to maximize independence of outputs Bell & Sejnowski 95 Smaragdis a 11 a 21 a 12 a 22 s 1 s 2 x 1 x 2 s 1 w 11 a 21 a 12 a 11 w 21 w 12 w 22 x 1 s^ 1 x ^ 2 s 2 s 2 a 22!" MutInfo "w ij simple mixing a good solution (usually) exists Auditory Scene Analysis - Dan Ellis /54

24 Mixtures, Scatters, Kurtosis Mixtures of sources become more Gaussian can measure e.g. via kurtsosis (4th moment) 0.6 s 1 vs. s x 1 vs. x p(s 1 ) kurtosis = p(x 1 ) kurtosis = p(s 2 ) kurtosis = p(x 2 ) kurtosis = Auditory Scene Analysis - Dan Ellis /54

25 ICA Limitations Cancellation is very finicky hard to get more than ~ 10 db rejection mix s1 Mixture Scatter s2 kurtosis Kurtosis vs.! from Parra & Spence 00 lee_ss_in_1.wav lee_ss_para_1.wav mix 1 2 s1 s ! / " The world is not instantaneous, fixed, linear subband models for reverberation continuous adaptation Needs spatially-compact interfering sources Auditory Scene Analysis - Dan Ellis /54

26 Computational Auditory Scene Analysis Central idea: Segment time-frequency into sources based on perceptual grouping cues input mixture Front end Segment signal features (maps) Object formation discrete objects Group Grouping rules Brown & Cooke 94 Okuno et al. 99 Hu & Wang Source groups eq onset time period frq.mod... principal cue is harmonicity Auditory Scene Analysis - Dan Ellis /54

27 CASA Preprocessing Correlogram: a 3rd periodicity axis envelope of wideband channels follows pitch delay line short-time autocorrelation Slaney & Lyon 90 sound Cochlea filterbank frequency channels envelope follower freq "'&&+3'1&45 /30.+ lag time c/w Modulation Filtering [Schimmel & Atlas 05] Auditory Scene Analysis - Dan Ellis /54

28 Weft Periodic Elements Represent harmonics without grouping? Correlogram slices Smooth Spectrum Ellis 96 lag freq time commonperiod feature period freq Period Track time time hard to separate multiple pitch tracks Comp. Aud. Scene Analysis - Dan Ellis /21

29 Time-Frequency (T-F) Masking Local Dominance assumption Original Mix + Oracle Labels freq / khz Male Female level / db cooke-v3n7.wav Oraclebased Resynth freq / khz time / sec time / sec cooke-v3msk-ideal.wav oracle masks are remarkably effective! mix max(male, female) < 3dB for ~80% of cells Auditory Scene Analysis - Dan Ellis /54 cooke-n7msk-ideal.wav

30 Combining Spatial + T-F Masking T-F masks based on inter-channel properties [Roman et al. 02], [Yilmaz & Rickard 04] lee_ss_in_1.wav multiple channels make CASA-like masks better lee_ss_duet_1.wav T-F masking after ICA [Blin et al. 04] cancellation can remove energy within T-F cells Auditory Scene Analysis - Dan Ellis /54

31 CASA limitations Driven by local features problems with masking, aperiodic sources... Limitations of T-F masking need to identify single-source regions cannot undo overlaps leaves gaps from Hu & Wang 04 huwang-v3n7.wav Auditory Scene Analysis - Dan Ellis /54

32 Auditory Illusions How do we explain illusions? pulsation threshold %&$'($)*+ 3,,, 2,,, 1,,, 0,,,,,,-. / / !"#$ sinewave speech %&$'($)*+.,,, 1,,, 4,,, 0,,, /,,,,,-. / / !"#$ phonemic restoration %&$'($)*+ 1,,, 4,,, 0,,, /,,,,,,-. / /-. 0!"#$ Something is providing the missing (illusory) pieces... source models Auditory Scene Analysis - Dan Ellis /54

33 Adding Top-Down Constraints Bottom-up CASA: limited to what s there input mixture Top-down predictions allow illusions input mixture Front end Front end signal eatures signal eatures Object formation Hypothesis management prediction errors Compare & reconcile discrete objects hypotheses Noise components Periodic components predicted features Grouping rules match observations to a world-model... Source groups Predict & combine Auditory Scene Analysis - Dan Ellis /54 Ellis 96

34 Separation vs. Inference Ideal separation is rarely possible i.e. no projection can completely remove overlaps Overlaps Ambiguity scene analysis = find most reasonable explanation Ambiguity can be expressed probabilistically i.e. posteriors of sources {S i } given observations X: P({S i } X) P(X {S i }) P({S i }) combination physics source models Better source models better inference.. learn from examples? Auditory Scene Analysis - Dan Ellis /54

35 Simple Source Separation Given models for sources, find best (most likely) states for spectra: p(x i 1,i 2 ) = N (x;c i1 + c i2,σ) combination model {i 1 (t),i 2 (t)} = argmax i1,i 2 p(x(t) i 1,i 2 ) can include sequential constraints... different domains for combining c and defining E.g. stationary noise: freq / mel bin Original speech In speech-shaped noise (mel magsnr = 2.41 db) VQ inferred states (mel magsnr = 3.6 db) Roweis 01, 03 Kristjannson 04, 06 inference of source state time / s Auditory Scene Analysis - Dan Ellis /54

36 Can Models Do CASA? Source models can learn harmonicity, onset... to subsume rules/representations of CASA freq / mel band VQ800 Codebook - Linear distortion measure level / db can capture spatial info too [Pearlmutter & Zador 04] Can also capture sequential structure e.g. consonants follow vowels... like people do? But: need source-specific models... for every possible source use model adaptation? [Ozerov et al. 2005] Auditory Scene Analysis - Dan Ellis /

37 Separation or Description? Are isolated waveforms required? clearly sufficient, but may not be necessary not part of perceptual source separation! Integrate separation with application? e.g. speech recognition mix separation identify target energy t-f masking + resynthesis ASR speech models words vs. mix identify speech models find best words model words source knowledge words output = abstract description of signal Auditory Scene Analysis - Dan Ellis /54

38 Missing Data Recognition Speech models p(x M) are multidimensional... need values for all dimensions to evaluate p( ) x But: can make inferences given u p(x k,x u ) just a subset Zof dimensions x k p(x k M) = p(x k,x u M)dx u Hence, missing data recognition: dimension!?5+$+%)!3()(!2($@ time! P(x q) = P(x 1 q) P(x 2 q) P(x 3 q) P(x 4 q) P(x 5 q) P(x 6 q) hard part is finding the mask (segregation) Auditory Scene Analysis - Dan Ellis /54 y Cooke et al. 01 p(x k x u <y ) p(x k ) x k

39 The Speech Fragment Decoder Match uncorrupt spectrum to ASR models using missing data Segregation S Joint search for model M and segregation S to maximize: P" X Y$ S# Barker et al. 05 Observation Y(f ) Source X(f ) P" M$ S Y # = P" M# % P" X M#! d P" X# X! P" S Y # Isolated Source Model Segregation Model freq Auditory Scene Analysis - Dan Ellis /54

40 Using CASA cues P" X Y$ S# P" M$ S Y # = P" M# % P" X M#! d X! P" S Y # P" X# CASA can help search consider only segregations made from CASA chunks CASA can rate segregation construct P(S Y) to reward CASA qualities: Frequency Proximity Common Onset Harmonicity Auditory Scene Analysis - Dan Ellis /54

41 Outline 1. The ASA Problem 2. Human ASA 3. Machine Source Separation 4. Systems & Examples Periodicity-based Model-based Music signals 5. Concluding Remarks Auditory Scene Analysis - Dan Ellis /54

42 Current CASA State-of-the-art bottom-up separation noise robust pitch track label T-F cells by pitch extensions to unvoiced transients by onset Hu & Wang 03 1,,, 5,,, 0,,, 1,,, 5,,, 0,,, %&$'($)*+ 4,,, /,,, 3,,, %&$'($)*+ 4,,, /,,, 3,,,.,,, 2,,,,,,-.,-/,-0, /!"#$.,,, 2,,,,,,-.,-/,-0, /!"#$ Auditory Scene Analysis - Dan Ellis /54

43 f/hz City Horn1 (10/10) Horn2 (5/10) Horn3 (5/10) Horn4 (8/10) Horn5 (10/10) Crash (10/10) Prediction-Driven CASA f/hz Wefts1! f/hz Noise2,Click Weft5 Wefts6,7 Weft8 Wefts9!12 Ellis 96 Identify objects in real-world scenes using sound elements f/hz Noise Squeal (6/10) Truck (7/10)!40!50!60! db time/s Auditory Scene Analysis - Dan Ellis /54

44 Singing Voice Separation Pitch tracking + harmonic separation Avery Wang 94 freq / khz freq / khz level / db time / sec Auditory Scene Analysis - Dan Ellis /54

45 Periodic/Aperiodic Separation Harmonic structure + repetition of drums 4,,, Virtanen 03 2,,, %&$'($)*+ 0,,,.,,, 4,,,,, -. / ,!"#$ 2,,, %&$'($)*+ 0,,,.,,, 4,,,,, -. / ,!"#$ 2,,, %&$'($)*+ 0,,,.,,,,, -. / ,!"#$ Auditory Scene Analysis - Dan Ellis /54

46 Speech Separation Challenge Mixed and Noisy Speech ASR task defined by Martin Cooke and Te-Won Lee short, grammatically-constrained utterances: <command:4><color:4><preposition:4><letter:25><number:10><adverb:4> e.g. "bin white at M 5 soon" t5_bwam5s_m5_bbilzp_6p1.wav Results to be presented at Interspeech 06 See also Statistical And Perceptual Audition workshop Auditory Scene Analysis - Dan Ellis /54

47 IBM s Superhuman Separation Optimal inference on Mixed Spectra model each speaker (512 mix GMM) Kristjansson et a Interspeech 06 Noisy vector, clean vector and estimate of clean vector 35 wer / % Applied to Speech Separation Challenge: Same Gender Speakers 6 db 3 db 0 db - 3 db - 6 db - 9 db SDL Recognizer No dynamics Acoustic dyn. Grammar dyn. Human Frequency [Hz] o Infer speakers and gain o Reconstruct speech o Recognize as normal... Use grammar constraints Auditory Scene Analysis - Dan Ellis /54 Energy [db] x estimate noisy vector clean vector

48 Transcription as Separation Transcribe piano recordings by classification train SVM detectors for every piano note 88 separate detectors, independent smoothing Trained on player piano recordings Bach 847 Disklavier freq / pitch A6 A5 A4 A3 A A time / sec Sse transcription to resynthesize... level / db Music Information Extraction - Ellis p. 48/32

49 Piano Transcription Results Significant improvement from classifier: frame-level accuracy results: Algorithm Errs False Pos False Neg d SVM 43.3% 27.9% 15.4% 3.44 Klapuri&Ryynänen 66.6% 28.1% 38.5% 2.71 Marolt 84.6% 36.5% 48.1% 2.35 Breakdown by frame type: Classification error % False Negatives False Positives # notes present Music Information Extraction - Ellis p. 49/32

50 Outline 1. The ASA Problem 2. Human ASA 3. Machine Source Separation 4. Systems & Examples 5. Concluding Remarks Evaluation Auditory Scene Analysis - Dan Ellis /54

51 Evaluation How to measure separation performance? depends what you are trying to do SNR? energy (and distortions) are not created equal different nonlinear components [Vincent et al. 06] Intelligibility? rare for nonlinear processing to improve intelligibility listening tests expensive ASR performance? Transmission errors optimum separate-then-recognize too simplistic; ASR needs to accommodate separation Net effect Increased artefacts Reduced interference Agressiveness of processing Auditory Scene Analysis - Dan Ellis /54

52 Evaluating Scene Analysis Need to establish ground truth subjective sources in real sound mixtures? Auditory Scene Analysis - Dan Ellis /54

53 More Realistic Evaluation Real-world speech tasks crowded environments applications: communication, command/control, transcription Metric human intelligibility? diarization annotation (not transcription) Lags freq / khz Personal Audio - Speech + Noise Pitch Track + Speaker Active Ground Truth time / sec level / db ks-noisyspeech.wav Auditory Scene Analysis - Dan Ellis /54

54 Summary & Conclusions Listeners do well separating sound mixtures using signal cues (location, periodicity) using source-property variations Machines do less well difficult to apply enough constraints need to exploit signal detail Models capture constraints learn from the real world adapt to sources Separation feasible in certain domains describing source properties is easier Auditory Scene Analysis - Dan Ellis /54

55 Sources / See Also NSF/AFOSR Montreal Workshops 03, 04 labrosa.ee.columbia.edu/montreal2004/ as well as the resulting book... Hanse meeting: Hanse/ DeLiang Wang s ICASSP 04 tutorial Martin Cooke s NIPS 02 tutorial Auditory Scene Analysis - Dan Ellis /54

56 References 1/2 [Barker et al. 05] J. Barker, M. Cooke, D. Ellis, Decoding speech in the presence of other sources, Speech Comm. 45, 5-25, [Bell & Sejnowski 95] A. Bell & T. Sejnowski, An information maximization approach to blind separation and blind deconvolution, Neural Computation, 7: , [Blin et al. 04] A. Blin, S. Araki, S. Makino, A sparseness mixing matrix estimation (SMME) solving the underdetermined BSS for convolutive mixtures, ICASSP, IV-85-88, [Bregman 90] A. Bregman, Auditory Scene Analysis, MIT Press, [Brungart 01] D. Brungart, Informational and energetic masking effects in the perception of two simultaneous talkers, JASA 109(3), March [Brungart et al. 01] D. Brungart, B. Simpson, M. Ericson, K. Scott, Informational and energetic masking effects in the perception of multiple simultaneous talkers, JASA 110(5), Nov [Brungart et al. 02] D. Brungart & B. Simpson, The effects of spatial separation in distance on the informational and energetic masking of a nearby speech signal, JASA 112(2), Aug [Brown & Cooke 94] G. Brown & M. Cooke, Computational auditory scene analysis, Comp. Speech & Lang. 8 (4), , [Cooke et al. 01] M. Cooke, P. Green, L. Josifovski, A. Vizinho, Robust automatic speech recognition with missing and uncertain acoustic data, Speech Communication 34, , [Cooke 06] M. Cooke, A glimpsing model of speech perception in noise, submitted to JASA. [Darwin & Carlyon 95] C. Darwin & R. Carlyon, Auditory grouping Handbk of Percep. & Cogn. 6: Hearing, , Academic Press, [Ellis 96] D. Ellis, Prediction-Driven Computational Auditory Scene Analysis, Ph.D. thesis, MIT EECS, [Hu & Wang 04] G. Hu and D.L. Wang, Monaural speech segregation based on pitch tracking and amplitude modulation, IEEE Tr. Neural Networks, 15(5), Sep [Okuno et al. 99] H. Okuno, T. Nakatani, T. Kawabata, Listening to two simultaneous speeches, Speech Communication 27, , Auditory Scene Analysis - Dan Ellis /54

57 References 2/2 [Ozerov et al. 05] A. Ozerov, P. Phillippe, R. Gribonval, F. Bimbot, One microphone singing voice separation using source-adapted models, Worksh. on Apps. of Sig. Proc. to Audio & Acous., [Pearlmutter & Zador 04] B. Pearlmutter & A. Zador, Monaural Source Separation using Spectral Cues, Proc. ICA, [Parra & Spence 00] L. Parra & C. Spence, Convolutive blind source separation of non-stationary sources, IEEE Tr. Speech & Audio, , [Reyes et al. 03] M. Reyes-Gómez, B. Raj, D. Ellis, Multi-channel source separation by beamforming trained with factorial HMMs, Worksh. on Apps. of Sig. Proc. to Audio & Acous., 13 16, [Roman et al. 02] N. Roman, D.-L. Wang, G. Brown, Location-based sound segregation, ICASSP, I , [Roweis 03] S. Roweis, Factorial models and refiltering for speech separation and denoising, EuroSpeech, [Schimmel & Atlas 05] S. Schimmel & L. Atlas, Coherent Envelope Detection for Modulation Filtering of Speech, ICASSP, I , [Slaney & Lyon 90] M. Slaney & R. Lyon, A Perceptual Pitch Detector, ICASSP, , [Smaragdis 98] P. Smaragdis, Blind separation of convolved mixtures in the frequency domain, Intl. Wkshp. on Indep. & Artif.l Neural Networks, Tenerife, Feb [Seltzer et al. 02] M. Seltzer, B. Raj, R. Stern, Speech recognizer-based microphone array processing for robust hands-free speech recognition, ICASSP, I , [Varga & Moore 90] A. Varga & R. Moore, Hidden Markov Model decomposition of speech and noise, ICASSP, , [Vincent et al. 06] E. Vincent, R. Gribonval, C. Févotte, Performance measurement in Blind Audio Source Separation. IEEE Trans. Speech & Audio, in press. [Yilmaz & Rickard 04] O. Yilmaz & S. Rickard, Blind separation of speech mixtures via time-frequency masking, IEEE Tr. Sig. Proc. 52(7), , Auditory Scene Analysis - Dan Ellis /54

Speech Separation in Humans and Machines

Speech Separation in Humans and Machines Speech Separation in Humans and Machines Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

Using Source Models in Speech Separation

Using Source Models in Speech Separation Using Source Models in Speech Separation Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

Sound, Mixtures, and Learning

Sound, Mixtures, and Learning Sound, Mixtures, and Learning Dan Ellis Laboratory for Recognition and Organization of Speech and Audio (LabROSA) Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

Auditory Scene Analysis: phenomena, theories and computational models

Auditory Scene Analysis: phenomena, theories and computational models Auditory Scene Analysis: phenomena, theories and computational models July 1998 Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 The computational

More information

Automatic audio analysis for content description & indexing

Automatic audio analysis for content description & indexing Automatic audio analysis for content description & indexing Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 Auditory Scene Analysis (ASA) Computational

More information

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches CASA talk - Haskins/NUWC - Dan Ellis 1997oct24/5-1 The importance of auditory illusions for artificial listeners 1 Dan Ellis International Computer Science Institute, Berkeley CA

More information

Recognition & Organization of Speech & Audio

Recognition & Organization of Speech & Audio Recognition & Organization of Speech & Audio Dan Ellis http://labrosa.ee.columbia.edu/ Outline 1 2 3 Introducing Projects in speech, music & audio Summary overview - Dan Ellis 21-9-28-1 1 Sound organization

More information

LabROSA Research Overview

LabROSA Research Overview LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1.

More information

Robustness, Separation & Pitch

Robustness, Separation & Pitch Robustness, Separation & Pitch or Morgan, Me & Pitch Dan Ellis Columbia / ICSI dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. Robustness and Separation 2. An Academic Journey 3. Future COLUMBIA

More information

Computational Auditory Scene Analysis: Principles, Practice and Applications

Computational Auditory Scene Analysis: Principles, Practice and Applications Computational Auditory Scene Analysis: Principles, Practice and Applications Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 Auditory Scene Analysis

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 5 Introducing Tandem modeling

More information

Computational Perception /785. Auditory Scene Analysis

Computational Perception /785. Auditory Scene Analysis Computational Perception 15-485/785 Auditory Scene Analysis A framework for auditory scene analysis Auditory scene analysis involves low and high level cues Low level acoustic cues are often result in

More information

Modulation and Top-Down Processing in Audition

Modulation and Top-Down Processing in Audition Modulation and Top-Down Processing in Audition Malcolm Slaney 1,2 and Greg Sell 2 1 Yahoo! Research 2 Stanford CCRMA Outline The Non-Linear Cochlea Correlogram Pitch Modulation and Demodulation Information

More information

Sound, Mixtures, and Learning

Sound, Mixtures, and Learning Sound, Mixtures, and Learning Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception Babies 'cry in mother's tongue' HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Babies' cries imitate their mother tongue as early as three days old German researchers say babies begin to pick up

More information

Using the Soundtrack to Classify Videos

Using the Soundtrack to Classify Videos Using the Soundtrack to Classify Videos Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception Long-term spectrum of speech HCS 7367 Speech Perception Connected speech Absolute threshold Males Dr. Peter Assmann Fall 212 Females Long-term spectrum of speech Vowels Males Females 2) Absolute threshold

More information

Using Speech Models for Separation

Using Speech Models for Separation Using Speech Models for Separation Dan Ellis Comprising the work of Michael Mandel and Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University Outline 1 2 3 4 5 Sound organization Background & related work Existing projects

More information

General Soundtrack Analysis

General Soundtrack Analysis General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 Introducing Robust speech recognition

More information

Lecture 4: Auditory Perception. Why study perception?

Lecture 4: Auditory Perception. Why study perception? EE E682: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1 2 3 4 5 6 Motivation: Why & how Auditory physiology Psychophysics: Detection & discrimination Pitch perception Speech perception

More information

Acoustic Signal Processing Based on Deep Neural Networks

Acoustic Signal Processing Based on Deep Neural Networks Acoustic Signal Processing Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu, Yanhui Tu, Qing Wang, Tian Gao, Jun Du, LiRong Dai Outline

More information

Role of F0 differences in source segregation

Role of F0 differences in source segregation Role of F0 differences in source segregation Andrew J. Oxenham Research Laboratory of Electronics, MIT and Harvard-MIT Speech and Hearing Bioscience and Technology Program Rationale Many aspects of segregation

More information

Auditory scene analysis in humans: Implications for computational implementations.

Auditory scene analysis in humans: Implications for computational implementations. Auditory scene analysis in humans: Implications for computational implementations. Albert S. Bregman McGill University Introduction. The scene analysis problem. Two dimensions of grouping. Recognition

More information

Adaptation of Classification Model for Improving Speech Intelligibility in Noise

Adaptation of Classification Model for Improving Speech Intelligibility in Noise 1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise) (Regular Paper) 23 4, 2018 7 (JBE Vol. 23, No. 4, July 2018) https://doi.org/10.5909/jbe.2018.23.4.511

More information

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking Vahid Montazeri, Shaikat Hossain, Peter F. Assmann University of Texas

More information

Lecture 9: Speech Recognition: Front Ends

Lecture 9: Speech Recognition: Front Ends EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition: Front Ends 1 2 Recognizing Speech Feature Calculation Dan Ellis http://www.ee.columbia.edu/~dpwe/e682/

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 BACKGROUND Speech is the most natural form of human communication. Speech has also become an important means of human-machine interaction and the advancement in technology has

More information

Lecture 3: Perception

Lecture 3: Perception ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 3: Perception 1. Ear Physiology 2. Auditory Psychophysics 3. Pitch Perception 4. Music Perception Dan Ellis Dept. Electrical Engineering, Columbia University

More information

Robust Neural Encoding of Speech in Human Auditory Cortex

Robust Neural Encoding of Speech in Human Auditory Cortex Robust Neural Encoding of Speech in Human Auditory Cortex Nai Ding, Jonathan Z. Simon Electrical Engineering / Biology University of Maryland, College Park Auditory Processing in Natural Scenes How is

More information

Auditory Scene Analysis

Auditory Scene Analysis 1 Auditory Scene Analysis Albert S. Bregman Department of Psychology McGill University 1205 Docteur Penfield Avenue Montreal, QC Canada H3A 1B1 E-mail: bregman@hebb.psych.mcgill.ca To appear in N.J. Smelzer

More information

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise 4 Special Issue Speech-Based Interfaces in Vehicles Research Report Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise Hiroyuki Hoshino Abstract This

More information

The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet

The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet Ghazaleh Vaziri Christian Giguère Hilmi R. Dajani Nicolas Ellaham Annual National Hearing

More information

Combating the Reverberation Problem

Combating the Reverberation Problem Combating the Reverberation Problem Barbara Shinn-Cunningham (Boston University) Martin Cooke (Sheffield University, U.K.) How speech is corrupted by reverberation DeLiang Wang (Ohio State University)

More information

Speech Enhancement Based on Deep Neural Networks

Speech Enhancement Based on Deep Neural Networks Speech Enhancement Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu and Jun Du at USTC 1 Outline and Talk Agenda In Signal Processing Letter,

More information

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions 3 Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions Ryoichi Takashima, Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University,

More information

Chapter 16: Chapter Title 221

Chapter 16: Chapter Title 221 Chapter 16: Chapter Title 221 Chapter 16 The History and Future of CASA Malcolm Slaney IBM Almaden Research Center malcolm@ieee.org 1 INTRODUCTION In this chapter I briefly review the history and the future

More information

Binaural processing of complex stimuli

Binaural processing of complex stimuli Binaural processing of complex stimuli Outline for today Binaural detection experiments and models Speech as an important waveform Experiments on understanding speech in complex environments (Cocktail

More information

Sound Texture Classification Using Statistics from an Auditory Model

Sound Texture Classification Using Statistics from an Auditory Model Sound Texture Classification Using Statistics from an Auditory Model Gabriele Carotti-Sha Evan Penn Daniel Villamizar Electrical Engineering Email: gcarotti@stanford.edu Mangement Science & Engineering

More information

A Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar

A Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar A Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar Perry C. Hanavan, AuD Augustana University Sioux Falls, SD August 15, 2018 Listening in Noise Cocktail Party Problem

More information

UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication

UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication Citation for published version (APA): Brons, I. (2013). Perceptual evaluation

More information

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS Mitchel Weintraub and Leonardo Neumeyer SRI International Speech Research and Technology Program Menlo Park, CA, 94025 USA ABSTRACT

More information

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds Hearing Lectures Acoustics of Speech and Hearing Week 2-10 Hearing 3: Auditory Filtering 1. Loudness of sinusoids mainly (see Web tutorial for more) 2. Pitch of sinusoids mainly (see Web tutorial for more)

More information

Pattern Playback in the '90s

Pattern Playback in the '90s Pattern Playback in the '90s Malcolm Slaney Interval Research Corporation 180 l-c Page Mill Road, Palo Alto, CA 94304 malcolm@interval.com Abstract Deciding the appropriate representation to use for modeling

More information

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening AUDL GS08/GAV1 Signals, systems, acoustics and the ear Pitch & Binaural listening Review 25 20 15 10 5 0-5 100 1000 10000 25 20 15 10 5 0-5 100 1000 10000 Part I: Auditory frequency selectivity Tuning

More information

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES Varinthira Duangudom and David V Anderson School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332

More information

Fig. 1 High level block diagram of the binary mask algorithm.[1]

Fig. 1 High level block diagram of the binary mask algorithm.[1] Implementation of Binary Mask Algorithm for Noise Reduction in Traffic Environment Ms. Ankita A. Kotecha 1, Prof. Y.A.Sadawarte 2 1 M-Tech Scholar, Department of Electronics Engineering, B. D. College

More information

Speech recognition in noisy environments: A survey

Speech recognition in noisy environments: A survey T-61.182 Robustness in Language and Speech Processing Speech recognition in noisy environments: A survey Yifan Gong presented by Tapani Raiko Feb 20, 2003 About the Paper Article published in Speech Communication

More information

Noise-Robust Speech Recognition Technologies in Mobile Environments

Noise-Robust Speech Recognition Technologies in Mobile Environments Noise-Robust Speech Recognition echnologies in Mobile Environments Mobile environments are highly influenced by ambient noise, which may cause a significant deterioration of speech recognition performance.

More information

Auditory principles in speech processing do computers need silicon ears?

Auditory principles in speech processing do computers need silicon ears? * with contributions by V. Hohmann, M. Kleinschmidt, T. Brand, J. Nix, R. Beutelmann, and more members of our medical physics group Prof. Dr. rer.. nat. Dr. med. Birger Kollmeier* Auditory principles in

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 8 Audio and Multimodal Attention Audio Scene Analysis Two-stage process Segmentation: decomposition to time-frequency segments Grouping

More information

Lecture 8: Spatial sound

Lecture 8: Spatial sound EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound 1 2 3 4 Spatial acoustics Binaural perception Synthesizing spatial audio Extracting spatial sounds Dan Ellis

More information

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

The role of periodicity in the perception of masked speech with simulated and real cochlear implants The role of periodicity in the perception of masked speech with simulated and real cochlear implants Kurt Steinmetzger and Stuart Rosen UCL Speech, Hearing and Phonetic Sciences Heidelberg, 09. November

More information

J Jeffress model, 3, 66ff

J Jeffress model, 3, 66ff Index A Absolute pitch, 102 Afferent projections, inferior colliculus, 131 132 Amplitude modulation, coincidence detector, 152ff inferior colliculus, 152ff inhibition models, 156ff models, 152ff Anatomy,

More information

Sound Analysis Research at LabROSA

Sound Analysis Research at LabROSA Sound Analysis Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

Auditory gist perception and attention

Auditory gist perception and attention Auditory gist perception and attention Sue Harding Speech and Hearing Research Group University of Sheffield POP Perception On Purpose Since the Sheffield POP meeting: Paper: Auditory gist perception:

More information

Challenges in microphone array processing for hearing aids. Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany

Challenges in microphone array processing for hearing aids. Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany Challenges in microphone array processing for hearing aids Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany SIEMENS Audiological Engineering Group R&D Signal Processing and Audiology

More information

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency Tetsuya Shimamura and Fumiya Kato Graduate School of Science and Engineering Saitama University 255 Shimo-Okubo,

More information

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080

INTRODUCTION J. Acoust. Soc. Am. 103 (2), February /98/103(2)/1080/5/$ Acoustical Society of America 1080 Perceptual segregation of a harmonic from a vowel by interaural time difference in conjunction with mistuning and onset asynchrony C. J. Darwin and R. W. Hukin Experimental Psychology, University of Sussex,

More information

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation Douglas S. Brungart a Air Force Research Laboratory, Human Effectiveness Directorate, 2610 Seventh Street,

More information

Acoustic Sensing With Artificial Intelligence

Acoustic Sensing With Artificial Intelligence Acoustic Sensing With Artificial Intelligence Bowon Lee Department of Electronic Engineering Inha University Incheon, South Korea bowon.lee@inha.ac.kr bowon.lee@ieee.org NVIDIA Deep Learning Day Seoul,

More information

Binaural Hearing. Why two ears? Definitions

Binaural Hearing. Why two ears? Definitions Binaural Hearing Why two ears? Locating sounds in space: acuity is poorer than in vision by up to two orders of magnitude, but extends in all directions. Role in alerting and orienting? Separating sound

More information

FIR filter bank design for Audiogram Matching

FIR filter bank design for Audiogram Matching FIR filter bank design for Audiogram Matching Shobhit Kumar Nema, Mr. Amit Pathak,Professor M.Tech, Digital communication,srist,jabalpur,india, shobhit.nema@gmail.com Dept.of Electronics & communication,srist,jabalpur,india,

More information

Musical Instrument Classification through Model of Auditory Periphery and Neural Network

Musical Instrument Classification through Model of Auditory Periphery and Neural Network Musical Instrument Classification through Model of Auditory Periphery and Neural Network Ladislava Jankø, Lenka LhotskÆ Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University

More information

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment SBIR 99.1 TOPIC AF99-1Q3 PHASE I SUMMARY

More information

Codebook driven short-term predictor parameter estimation for speech enhancement

Codebook driven short-term predictor parameter estimation for speech enhancement Codebook driven short-term predictor parameter estimation for speech enhancement Citation for published version (APA): Srinivasan, S., Samuelsson, J., & Kleijn, W. B. (2006). Codebook driven short-term

More information

Hearing the Universal Language: Music and Cochlear Implants

Hearing the Universal Language: Music and Cochlear Implants Hearing the Universal Language: Music and Cochlear Implants Professor Hugh McDermott Deputy Director (Research) The Bionics Institute of Australia, Professorial Fellow The University of Melbourne Overview?

More information

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics 2/14/18 Can hear whistle? Lecture 5 Psychoacoustics Based on slides 2009--2018 DeHon, Koditschek Additional Material 2014 Farmer 1 2 There are sounds we cannot hear Depends on frequency Where are we on

More information

Hearing II Perceptual Aspects

Hearing II Perceptual Aspects Hearing II Perceptual Aspects Overview of Topics Chapter 6 in Chaudhuri Intensity & Loudness Frequency & Pitch Auditory Space Perception 1 2 Intensity & Loudness Loudness is the subjective perceptual quality

More information

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation Aldebaro Klautau - http://speech.ucsd.edu/aldebaro - 2/3/. Page. Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation ) Introduction Several speech processing algorithms assume the signal

More information

FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION

FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION FAST AMPLITUDE COMPRESSION IN HEARING AIDS IMPROVES AUDIBILITY BUT DEGRADES SPEECH INFORMATION TRANSMISSION Arne Leijon and Svante Stadler Sound and Image Processing Lab., School of Electrical Engineering,

More information

NOISE REDUCTION ALGORITHMS FOR BETTER SPEECH UNDERSTANDING IN COCHLEAR IMPLANTATION- A SURVEY

NOISE REDUCTION ALGORITHMS FOR BETTER SPEECH UNDERSTANDING IN COCHLEAR IMPLANTATION- A SURVEY INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 NOISE REDUCTION ALGORITHMS FOR BETTER SPEECH UNDERSTANDING IN COCHLEAR IMPLANTATION- A SURVEY Preethi N.M 1, P.F Khaleelur

More information

Hybrid Masking Algorithm for Universal Hearing Aid System

Hybrid Masking Algorithm for Universal Hearing Aid System Hybrid Masking Algorithm for Universal Hearing Aid System H.Hensiba 1,Mrs.V.P.Brila 2, 1 Pg Student, Applied Electronics, C.S.I Institute Of Technology 2 Assistant Professor, Department of Electronics

More information

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear The psychophysics of cochlear implants Stuart Rosen Professor of Speech and Hearing Science Speech, Hearing and Phonetic Sciences Division of Psychology & Language Sciences Prelude Envelope and temporal

More information

EEL 6586, Project - Hearing Aids algorithms

EEL 6586, Project - Hearing Aids algorithms EEL 6586, Project - Hearing Aids algorithms 1 Yan Yang, Jiang Lu, and Ming Xue I. PROBLEM STATEMENT We studied hearing loss algorithms in this project. As the conductive hearing loss is due to sound conducting

More information

SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf

SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf 2th European Signal Processing Conference (EUSIPCO 212) Bucharest, Romania, August 27-31, 212 SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS Christina Leitner and Franz Pernkopf

More information

White Paper: micon Directivity and Directional Speech Enhancement

White Paper: micon Directivity and Directional Speech Enhancement White Paper: micon Directivity and Directional Speech Enhancement www.siemens.com Eghart Fischer, Henning Puder, Ph. D., Jens Hain Abstract: This paper describes a new, optimized directional processing

More information

Hearing. Juan P Bello

Hearing. Juan P Bello Hearing Juan P Bello The human ear The human ear Outer Ear The human ear Middle Ear The human ear Inner Ear The cochlea (1) It separates sound into its various components If uncoiled it becomes a tapering

More information

Infant Hearing Development: Translating Research Findings into Clinical Practice. Auditory Development. Overview

Infant Hearing Development: Translating Research Findings into Clinical Practice. Auditory Development. Overview Infant Hearing Development: Translating Research Findings into Clinical Practice Lori J. Leibold Department of Allied Health Sciences The University of North Carolina at Chapel Hill Auditory Development

More information

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for What you re in for Speech processing schemes for cochlear implants Stuart Rosen Professor of Speech and Hearing Science Speech, Hearing and Phonetic Sciences Division of Psychology & Language Sciences

More information

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION Lu Xugang Li Gang Wang Lip0 Nanyang Technological University, School of EEE, Workstation Resource

More information

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception ISCA Archive VOQUAL'03, Geneva, August 27-29, 2003 Jitter, Shimmer, and Noise in Pathological Voice Quality Perception Jody Kreiman and Bruce R. Gerratt Division of Head and Neck Surgery, School of Medicine

More information

Topic 4. Pitch & Frequency. (Some slides are adapted from Zhiyao Duan s course slides on Computer Audition and Its Applications in Music)

Topic 4. Pitch & Frequency. (Some slides are adapted from Zhiyao Duan s course slides on Computer Audition and Its Applications in Music) Topic 4 Pitch & Frequency (Some slides are adapted from Zhiyao Duan s course slides on Computer Audition and Its Applications in Music) A musical interlude KOMBU This solo by Kaigal-ool of Huun-Huur-Tu

More information

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair Who are cochlear implants for? Essential feature People with little or no hearing and little conductive component to the loss who receive little or no benefit from a hearing aid. Implants seem to work

More information

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES ISCA Archive ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES Allard Jongman 1, Yue Wang 2, and Joan Sereno 1 1 Linguistics Department, University of Kansas, Lawrence, KS 66045 U.S.A. 2 Department

More information

Speech Enhancement Using Deep Neural Network

Speech Enhancement Using Deep Neural Network Speech Enhancement Using Deep Neural Network Pallavi D. Bhamre 1, Hemangi H. Kulkarni 2 1 Post-graduate Student, Department of Electronics and Telecommunication, R. H. Sapat College of Engineering, Management

More information

Gender Based Emotion Recognition using Speech Signals: A Review

Gender Based Emotion Recognition using Speech Signals: A Review 50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department

More information

Acoustic feedback suppression in binaural hearing aids

Acoustic feedback suppression in binaural hearing aids Aud Vest Res (2016);25(4):207-214. RESEARCH ARTICLE Acoustic feedback suppression in binaural hearing aids Fatmah Tarrab 1*, Zouheir Marmar 1, Isam Alamine 2 1 - Department of Biomedical Engineering, Faculty

More information

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST 2010 1379 Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments Hynek Bořil,

More information

Who are cochlear implants for?

Who are cochlear implants for? Who are cochlear implants for? People with little or no hearing and little conductive component to the loss who receive little or no benefit from a hearing aid. Implants seem to work best in adults who

More information

Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phase Components

Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phase Components Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phase Components Miss Bhagat Nikita 1, Miss Chavan Prajakta 2, Miss Dhaigude Priyanka 3, Miss Ingole Nisha 4, Mr Ranaware Amarsinh

More information

Auditory Scene Analysis. Dr. Maria Chait, UCL Ear Institute

Auditory Scene Analysis. Dr. Maria Chait, UCL Ear Institute Auditory Scene Analysis Dr. Maria Chait, UCL Ear Institute Expected learning outcomes: Understand the tasks faced by the auditory system during everyday listening. Know the major Gestalt principles. Understand

More information

An active unpleasantness control system for indoor noise based on auditory masking

An active unpleasantness control system for indoor noise based on auditory masking An active unpleasantness control system for indoor noise based on auditory masking Daisuke Ikefuji, Masato Nakayama, Takanabu Nishiura and Yoich Yamashita Graduate School of Information Science and Engineering,

More information

Voice Detection using Speech Energy Maximization and Silence Feature Normalization

Voice Detection using Speech Energy Maximization and Silence Feature Normalization , pp.25-29 http://dx.doi.org/10.14257/astl.2014.49.06 Voice Detection using Speech Energy Maximization and Silence Feature Normalization In-Sung Han 1 and Chan-Shik Ahn 2 1 Dept. of The 2nd R&D Institute,

More information

An Auditory System Modeling in Sound Source Localization

An Auditory System Modeling in Sound Source Localization An Auditory System Modeling in Sound Source Localization Yul Young Park The University of Texas at Austin EE381K Multidimensional Signal Processing May 18, 2005 Abstract Sound localization of the auditory

More information

IN EAR TO OUT THERE: A MAGNITUDE BASED PARAMETERIZATION SCHEME FOR SOUND SOURCE EXTERNALIZATION. Griffin D. Romigh, Brian D. Simpson, Nandini Iyer

IN EAR TO OUT THERE: A MAGNITUDE BASED PARAMETERIZATION SCHEME FOR SOUND SOURCE EXTERNALIZATION. Griffin D. Romigh, Brian D. Simpson, Nandini Iyer IN EAR TO OUT THERE: A MAGNITUDE BASED PARAMETERIZATION SCHEME FOR SOUND SOURCE EXTERNALIZATION Griffin D. Romigh, Brian D. Simpson, Nandini Iyer 711th Human Performance Wing Air Force Research Laboratory

More information

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some

More information

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) Virendra Chauhan 1, Shobhana Dwivedi 2, Pooja Karale 3, Prof. S.M. Potdar 4 1,2,3B.E. Student 4 Assitant Professor 1,2,3,4Department of Electronics

More information

Hearing in the Environment

Hearing in the Environment 10 Hearing in the Environment Click Chapter to edit 10 Master Hearing title in the style Environment Sound Localization Complex Sounds Auditory Scene Analysis Continuity and Restoration Effects Auditory

More information