Sound Analysis Research at LabROSA

Size: px

Start display at page:

Download "Sound Analysis Research at LabROSA"

Harriet Moody
5 years ago
Views:

1 Sound Analysis Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA 1. Speech 2. Music 3. Environmental Sound LabROSA Overview - Dan Ellis /17

2 LabROSA Overview Information Extraction Music Eigenrhythms Environment Personal audio Machine Learning Meeting turns Speech FDLP Signal Processing LabROSA Overview - Dan Ellis /17

3 1. Speech Analysis / Recognition Speech recognizers work for read speech poorly for spontaneous e.g. % errors 30% Transform spontaneous speech to read? Read with Sambarta Bhattacharjee Spontaneous Spont speech pole freq slope < 1 reduction Read speech pole freq LabROSA Overview - Dan Ellis /17

4 Meeting Recordings with Jerry Liu and ICSI Multi-mic recordings for speaker turns every voice reaches every mic... (?)... but with differing coupling filters (delays, gains) Find turns with minimal assumptions e.g. ad-hoc sensor setups (multiple PDAs) differences to remove effect of source signal - no spectral models, < 1xRT LabROSA Overview - Dan Ellis /17

5 Speaker Turns from Timing Diffs Find best timing skew between mic pairs Find clusters in high-confidence points Fit Gaussians to each cluster, assign that class to all frames within radius 0 ICSI0: good points 0 All pts: nearest class 0 All pts: closest dimension LabROSA Overview - Dan Ellis /17

2. Music Signal Analysis A lot of music data available e.g. 60G of MP3 00 hr of audio/k tracks What can we do with it? implicit definition of music Quality vs.

6 2. Music Signal Analysis A lot of music data available e.g. 60G of MP3 00 hr of audio/k tracks What can we do with it? implicit definition of music Quality vs. quantity Speech recognition lesson: x data, 1/th annotation, twice as useful Motivating Applications music similarity / classification computer (assisted) music generation insight into music LabROSA Overview - Dan Ellis /17

7 Transcription as Classification with Graham Poliner Signal models typically used for transcription harmonic spectrum, superposition But... trade domain knowledge for data transcription as pure classification problem: Audio Trained classifier p("c0" Audio) p("c#0" Audio) p("d0" Audio) p("d#0" Audio) p("e0" Audio) p("f0" Audio) single N-way discrimination for melody per-note classifiers for polyphonic transcription LabROSA Overview - Dan Ellis /17

8 Classifier Transcription Results Trained on MIDI syntheses (32 songs) SMO SVM (Weka) Tested on ISMIR MIREX 03 set foreground/background separation Frame-level pitch concordance system jazz3 overall fg+bg 71.% 44.3% just fg 6.1% 4.4% LabROSA Overview - Dan Ellis /17

9 Eigenrhythms: Drum Pattern Space Pop songs built on repeating drum loop bass drum, snare, hi-hat small variations on a few basic patterns with John Arroyo Eigen-analysis (PCA) to capture variations? by analyzing lots of (MIDI) data Applications music categorization beat box synthesis LabROSA Overview - Dan Ellis /17

10 Eigenrhythms Need + Eigenvectors for good coverage of 0 training patterns ( dims) Top patterns: LabROSA Overview - Dan Ellis /17

11 Eigenrhythms for Classification 0 - Projections in Eigenspace / LDA space PCA(1,2) projection (16% corr) 6 blues country 4 disco hiphop2 house newwave rock 0 pop punk -2 rnb LDA(1,2) projection (33% corr) way Genre classification (nearest nbr): PCA3: % correct LDA4: 36% correct LabROSA Overview - Dan Ellis /17

3. Other Sounds: Clap Detection Rhythmic clapping may help neural development sensori-motor planning focus and attention Interactive metronome devices give feedback on

12 3. Other Sounds: Clap Detection Rhythmic clapping may help neural development sensori-motor planning focus and attention Interactive metronome devices give feedback on synchrony sensor-based Classroom deployment? acoustic-based? for multiple simultaneous users?? with Nathan Lesser from interactivemetronome.com LabROSA Overview - Dan Ellis /17

freq / khx 0 - -40-60 -80 8 6 Near-field (327MUDD nf0:4) Far-field (327MUDD ff0:4) 4 2 0 0 0.1 0.

13 Clap Range Discrimination Absolute level varies Decay slopes ~ same reverberation (RT 60 ~ 900ms) Initial burst for near-field direct sound amplitude energy (4ms) / db freq / khx Near-field (327MUDD nf0:4) Far-field (327MUDD ff0:4) time / s time / s LabROSA Overview - Dan Ellis /17

Personal Audio with Keansub Lee Easy to record everything you hear ~0GB / year @ 64 kbps Very hard to find anything how to scan? how to visualize? how to index?

14 Personal Audio with Keansub Lee Easy to record everything you hear ~0GB / 64 kbps Very hard to find anything how to scan? how to visualize? how to index? Starting point: Collect data ~ 60 hours (8 days, ~7. hr/day) hand-mark 139 segments (26 min/seg avg.) assign to 16 classes (8 have multiple instances) LabROSA Overview - Dan Ellis /17

15 Features for Long Recordings Feature frames = 1 min (not 2 ms!) Characterize variation within each frame... Average Linear Energy 1 Normalized Energy Deviation 60 freq / bark 0 80 freq / bark 40 Average Log Energy 60 db 1 Log Energy Deviation db freq / bark freq / bark Average Spectral Entropy 0. bits and structure within coarse auditory bands db freq / bark freq / bark Spectral Entropy Deviation time / min db bits LabROSA Overview - Dan Ellis /17

16 Personal Audio Applications Visualization / browsing / diary inference link in other information sources - diary - NoteTaker interface: what was I hearing? LabROSA Overview - Dan Ellis /17

17 LabROSA Summary LabROSA signal processing + machine learning + information extraction Applications Speech: Recognition, Organization Music: Transcription, Recommendation Environment: Detection, Description Also... signal separation, compression, dolphins... LabROSA Overview - Dan Ellis /17

LabROSA Research Overview

LabROSA Research Overview Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1.