General Soundtrack Analysis

Size: px

Start display at page:

Download "General Soundtrack Analysis"

Naomi Fisher
5 years ago
Views:

1 General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University Outline introduction Broadcast soundtrack monitoring Example technologies Summary FBIS - Dan Ellis

2 1 : Sound Organization freq / Hz alj time / sec Analysis Voice1 Music1 Stab Music2 Music2 (quiet) Voice2 Analyzing and describing complex sounds: - continuous sound mixture distinct objects & events Human listeners as the prototype - strong subjective impression when listening -..but hard to see in signal FBIS - Dan Ellis

3 The information in sound freq / Hz Steps 1 Steps time / s Hearing confers evolutionary advantage - optimized to get useful information from sound Enormous detail is available in familiar sounds - ecological influence on our sense of hearing FBIS - Dan Ellis

4 Audio Information Extraction Text Information Retrieval IR AutomaticSpeech Recognition ASR Text Information Extraction IE Blind Source Separation BSS Spoken Document Retrieval SDR Independent Component Analysis ICA Audio Information Extraction AIE Music Information Retrieval Music IR Computational Auditory Scene Analysis CASA Audio Fingerprinting Audio Content-based Retrieval Audio CBR Unsupervised Clustering Multimedia Information Retrieval MMIR Domain - text... speech... music... general audio Operation - recognize... index/retrieve... organize FBIS - Dan Ellis

5 Summary DOMAINS Broadcast Movies Lectures Music Meetings Personal recordings Location monitoring HCI Object-based structure discovery & learning Speech recognition Speaker description Nonspeech recognition Scene Analysis Audio-visual integration Music analysis APPLICATIONS Multimedia access & search Personal media management Machine perception & awareness Prostheses / human augmentation Automatic judgments/recommendation FBIS - Dan Ellis

6 Outline introduction Broadcast soundtrack monitoring - Available information - Information filtering - Intelligent analysis Example technologies Summary FBIS - Dan Ellis

7 2 Broadcast soundtrack monitoring: What information is available? Video and soundtrack reinforce, complement - hard things in one domain can be easy in other Information at different scales: generic sound events dialog style words program genre speaker emotion story coverage activity level schedule changes specific speaker ID seconds days timescale FBIS - Dan Ellis

8 Information filtering Maximizing analyst utility: information selection Automatic support for triage : - segmentation - inferring schedules - identify repeated segments - generic categorization/labeling - task-specific flagging FBIS - Dan Ellis

9 Automatic labeling/flagging Computer as vigilant monitor Models Monitoring computer Signal Flagged events Range of tasks - generic unusual patterns - specific trigger events Issues - false alarms vs. misses - combinations of simple detectors for higher-level classes e.g. program type FBIS - Dan Ellis

10 Schedule summarization Learn the patterns of a particular broadcaster: - 24/7 monitoring - automatic segmentation into programs - automatic classification of genres 6 12 M T W Th F S Su News Talk Sports Music Other Support information extraction - program types, repetitions, summarization - schedule changes activity? FBIS - Dan Ellis

11 Outline introduction Broadcast soundtrack monitoring Example technologies - Sound enhancement - Locating repetitions - eling soundtracks Summary FBIS - Dan Ellis

3 Example technologies: Sound enhancement Time scale modification - speed up for rapid skimming - slow down for close listening Noise reduction freq / Hz Original

12 3 Example technologies: Sound enhancement Time scale modification - speed up for rapid skimming - slow down for close listening Noise reduction freq / Hz Original speech Sped up by 5% Slowed down to 5% Noise reduced at pk - 2dB time / sec FBIS - Dan Ellis

13 Repetition detection Matching signals is expensive - but matching strings is fast Represent audio as simple string - recognition into arbitrary classes - match detection becomes string matching spee cell Models perc viol anim mach soundtrack Recognizer v p a v c a String match detection target Recognizer p a c p v a s Monitor for many signatures FBIS - Dan Ellis

Pr(mus) freq / Hz Soundtrack labeling Broad class: speech-music alj-2-5 8 6 4 2 1.

14 Pr(mus) freq / Hz Soundtrack labeling Broad class: speech-music alj time / sec Specific events: alarms freq / Hz 4 2 Speech + alarm 1 Pr(alarm) time / sec FBIS - Dan Ellis

15 Speaking style Recognizing information other than words Meeting Recorder project: Locate overlaps Spkr A backchannel (signals desire to regain floor?) floor seizure mr speaker active Spkr B speake cedes f Spkr C interruptions Spkr D breath noise Spkr E crosstalk Table top time / secs level/db 4 2 C A O M JL mr Speaker emotion - depends on good baseline FBIS - Dan Ellis

16 Outline introduction Broadcast soundtrack monitoring Example technologies Summary FBIS - Dan Ellis

17 4 Summary: Soundtrack analysis Soundtrack carries information - useful and detailed - complementary to image - rapid processing Open source monitoring - Need to find the interesting bits - Short-time: specific detectors - Long-time: schedule, genre classification Current techniques - Signal enhancement - Detectors for speech, music, alarms etc. - Classification of interaction style, emotion FBIS - Dan Ellis

18 Extra slides FBIS - Dan Ellis

Automatic Speech Recognition (ASR) Standard speech recognition structure: Feature calculation sound D A T A Acoustic model parameters Word models s ah t Language model p("sat" "the","cat") p("saw"

19 Automatic Speech Recognition (ASR) Standard speech recognition structure: Feature calculation sound D A T A Acoustic model parameters Word models s ah t Language model p("sat" "the","cat") p("saw" "the","cat") Acoustic classifier HMM decoder feature vectors phone probabilities phone / word sequence Understanding/ application... State of the art word-error rates (WERs): - 2% (dictation) - 3% (telephone conversations) Can use multiple streams... FBIS - Dan Ellis

20 The Meeting Recorder project (with ICSI, UW, SRI, IBM) Microphones in conventional meetings - for summarization/retrieval/behavior analysis - informal, overlapped speech Data collection (ICSI, UW,...): - 1 hours collected, ongoing transcription - headsets + tabletop + PDA FBIS - Dan Ellis

21 Crosstalk cancellation Baseline speaker activity detection is hard: Spkr A backchannel (signals desire to regain floor?) floor seizure mr speaker active Spkr B speaker B cedes floor Spkr C interruptions Spkr D breath noise Spkr E crosstalk Table top time / secs level/db 4 2 Noisy crosstalk model: m = C s + n Estimate subband C Aa from A s peak energy -... including pure delay (1 ms frames) -... then linear inversion FBIS - Dan Ellis

22 Alarm sound detection Alarm sounds have particular structure - people know them when they hear them Isolate alarms in sound mixtures freq / Hz time / sec representation of energy in time-frequency - formation of atomic elements - grouping by common properties (onset &c.) - classify by attributes... Key: recognize despite background FBIS - Dan Ellis

23 Audio Information Retrieval (with Manuel Reyes) Searching in a database of audio - speech.. use ASR - text annotations.. search them - sound effects library? e.g. Muscle Fish SoundFisher browser - define multiple perceptual feature dimensions - search by proximity in (weighted) feature space Segment feature analysis Sound segment database Feature vectors Seach/ comparison Results Segment feature analysis Query example - features are global for each soundfile, no attempt to separate mixtures FBIS - Dan Ellis

24 Audio Retrieval: Results Musclefish corpus - most commonly reported set Features - mfcc, brightness, bandwidth, pitch... - no temporal sequence structure Results: - 28 examples, 16 classes, 84% correct - confusions: Musical instrs. 136 (14) Instr Spch Env Anim Mech Speech 17 (7) 2 Eviron. 2 6 (1) Animals () Mechanical 1 15 (2) FBIS - Dan Ellis

25 CASA for audio retrieval When audio material contains mixtures, global features are insufficient Retrieval based on element/object analysis: Generic element analysis Object formation Continuous audio archive Element representations Objects + properties Seach/ comparison Results Generic element analysis (Object formation) Query example rushing water Word-to-class mapping Symbolic query Properties alone - features are calculated over grouped subsets FBIS - Dan Ellis

Recognition & Organization of Speech & Audio

Recognition & Organization of Speech & Audio Dan Ellis http://labrosa.ee.columbia.edu/ Outline 1 2 3 Introducing Projects in speech, music & audio Summary overview - Dan Ellis 21-9-28-1 1 Sound organization