Tandem modeling investigations

Size: px

Start display at page:

Download "Tandem modeling investigations"

Anthony Bennett
5 years ago
Views:

1 Tandem modeling investigations Dan Ellis International Computer Science Institute, Berkeley CA Outline What makes Tandem successful? Can we make Tandem better? Does Tandem work with LVCSR tricks? Tandem investigations - Dan Ellis

2 +w +w 1 Combo over msg: +20% What makes Tandem work? plp (with Manuel Reyes) Input sound Pre-nonlinearity over posteriors: +12% + PCA orthog'n Gauss mix models HTK decoder s ah t Combo over plp: +20% msg KLT over direct: +8% Combo-into-HTK over Combo-into-noway: +15% Words Combo over mfcc: +25% NN over HTK: +15% Tandem over HTK: +35% Tandem over hybrid: +25% Tandem combo over HTK mfcc baseline: +53% Model diversity? - try a phone-based GMM model - try training the NN model to HTK state labels Discriminative network training? - (try posteriors from GMM & Bayes) Tandem investigations - Dan Ellis

3 Phone vs. word models plp KLT orthog'n Gauss mix models HTK decoder Input sound +w s ah t Words Trained on phoneme targets Trained to subword states Try a phone-based HTK model (instead of whole-word models) Try training NN model to subword-state labels net outputs; reduce to 40 in KLT Results (Aurora2k, HTK-baseline WER ratio): System test A: matched test B: var noise test C: var chan Tandem PLP baseline 63.5% 70l.3% 59.5% Phone-based HTK sys 63.6% 72.5% 61.5% Subword-based NN sys 63.1% 62.8% 55.1% Diversity doesn t help - subword units may be good for NN Tandem investigations - Dan Ellis

4 2 Enhancements to Tandem-Aurora More tandem-feature-domain processing: norm / deltas? KLT orthog'n norm / deltas? Gauss mix models +w Results (HTK baseline WER ratio): System test A: matched test B: var noise test C: var chan PLP: Tandem baseline 63.5% 70l.3% 59.5% PLP: norm - KLT 72.6% 71.2% 63.6% PLP: KLT - norm 57.8% 58.8% 51.3% PLP: KLT - delta 59.0% 60.2% 52.9% PLP: KLT - delta - norm 58.1% 59.9% 48.9% PLP: delta - KLT - norm 54.7% 53.6% 46.9% - delta-klt-norm: 80% Tdm baseline WER Tandem investigations - Dan Ellis

5 Best effort Tandem system Deltas & norms help PLP: try on combo (PLP+MSG) system: System test A: matched test B: var noise test C: var chan PLP+MSG: baseline 51.1% 52.0% 45.6% PLP+MSG: dlt-klt-nrm 50.9% 50.5% 43.6% PLP+MSG: KLT-nrm 48.3% 49.5% 39.4% - deltas hurt for MSG: features too sluggish? Deltas help clean, norms help noisy: baseline K-D K-N WER / % 40 4 WER / % clean SNR / db Tandem investigations - Dan Ellis

6 +w +w 3 Tandem for LVCSR: the SPINE task (with Rita Singh/CMU & Sunil Sivadas/OGI) Noisy spontaneous speech, ~5000 word vocab Recognition: PLP feature calculation 1 Tandem feature calculation SPHINX recognizer Input sound Pre-nonlinearity outputs + PCA decorrelation GMM Subword likelihoods HMM decoder s ah t Words MSG feature calculation 2 MLLR adaptation - same tandem features - NN training from Broadcast News boot + iterate - GMM-HMM has context-dependence, MLLR Tandem investigations - Dan Ellis

7 SPINE-Tandem results Evaluation WER results: Features (dimensions) CI system CD system CD + MLLR MFCC + d + dd (39) 69.5% 35.1% 33.5% Tandem features (56) 47.6% 35.7% 32.8% - much better for CI systems - differences evaporate with CD, MLLR Not quite fair: - CD senones optimized for MFCC - worth 2-3% absolute? Not unexpected: - NN confounds CD variants - Tandem space very nonlinear - bad for MLLR Any hope? - more training data / train CD classes /... Tandem investigations - Dan Ellis

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 Introducing Robust speech recognition