Tandem acoustic modeling: Neural nets for mainstream ASR?

Size: px

Start display at page:

Download "Tandem acoustic modeling: Neural nets for mainstream ASR?"

Asher Lindsey
5 years ago
Views:

1 Tandem acoutic modeling: for maintream ASR? Dan Elli International Computer Science Intitute Berkeley CA Outline 2 3 Tandem acoutic modeling Inide Tandem ytem: What going on? Future direction Tandem modeling - Dan Elli

2 Tandem acoutic modeling ETSI Aurora noiy digit evaluation - new feature (for ditributed peech recognition) - Gauian mixture HTK back-end provided How to ue hybrid-connectionit trick? (multitream poterior combination etc.) Ue poterior output a feature for HTK... = Tandem connection of two large tatitical model: Neural Net (NN) and Gauian Mixture (GMM) Tandem modeling - Dan Elli

3 C dcl C C dcl dcl C dcl The Tandem tructure Hybrid Connectionit-HMM ASR Conventional ASR (HTK) Noway decoder Gau mix model HTK decoder ah t ah t Speech feature Phone probabilitie Speech feature Subword likelihood Tandem modeling Gau mix model HTK decoder ah t Speech feature Phone probabilitie Subword likelihood Refined ytem Pre-nonlinearity output + PCA orthog'n Gau mix model HTK decoder ah t Alternate feature Subword likelihood Better reult when poterior are made more Gauian Tandem allow poterior combination for HTK Tandem modeling - Dan Elli

4 Training a tandem model Tandem modeling ue two feature-pace - NN etimate phone poterior (dicriminant) - GMM model ubword likelihood (ditribution) Training procedure - NN trained (backprop) on bae feature to forced-alignment phone target - GMM trained on modified NN output via EM to maximie ubword model likelihood - HTK backend know nothing of phone model Decoupled (good) but equential Training et? - can ue ame for both - learning different info - could ue different - for cro-tak robue Tandem modeling - Dan Elli

5 Tandem ytem reult It work very well: WER a a function of SNR for variou Aurora99 ytem WER / % (log cale) Average WER ratio to baeline: HTK GMM: % Hybrid: 84.6% Tandem: 64.5% Tandem + PC: 47.2% HTK GMM baeline Hybrid connectionit Tandem Tandem + PC clean SNR / db (averaged over 4 noie) Sytem-feature Avg. WER 2- db Baeline WER ratio HTK-mfcc 3.7% % -mfcc 9.3% 84.5% Tandem-mfcc 7.4% 64.5% Tandem-mg+plp 6.4% 47.2% Tandem modeling - Dan Elli

6 2 Inide Tandem ytem: What going on? Viualization of the net output one eight three (MFP_83A) Spectrogram freq / khz Clean dB SNR to Hall noie db Ceptral-moothed mel pectrum freq / mel chan Phone poterior etimate phone q ax uw ow ao ah ay ey eh ih iy w r n v q ax uw ow ao ah ay ey eh ih iy w r n v 3.5 th fz th fz Hidden layer linear output phone kcl k t q ax uw ow ao ah ay ey eh ih iy w r n v th fz kcl k t time / kcl k t q ax uw ow ao ah ay ey eh ih iy w r n v th fz kcl k t time / normalize away noie Tandem modeling - Dan Elli

7 pace magnification perform a nonlinear remapping of the feature pace Smoothed pectrum 2 Linear output for /r/, /iy/, /w/ Poterior for /r/, /iy/, /w/ time / - mall change acro critical boundarie reult in large output change Tandem modeling - Dan Elli

8 dcl dcl Relative contribution Approx relative impact on baeline WER ratio for different component: Combo over mg: +2% plp C Pre-nonlinearity over poterior: +2% + PCA orthog'n Gau mix model HTK decoder ah t Combo over plp: +2% mg C KLT over direct: +8% Combo-into-HTK over Combo-into-noway: +5% Combo over mfcc: +25% NN over HTK: +5% Tandem over HTK: +35% Tandem over hybrid: +25% Tandem combo over HTK mfcc baeline: +53% Tandem modeling - Dan Elli

9 dcl Omitting the GMM Tied poterior (Rottland & Rigoll, ICASSP): Conventional ASR (HTK) Speech feature Gau mix model HMM decoder mixture weight Per-mixture likelihood - EM training of GMM and HMM mixture weight ah t ub-phone tate Rottland & Rigoll (2) HMM decoder C ah t ub-phone tate Speech feature - only mixture weight trained by EM Sytem mixture weight Phone probabilitie WSJ WER Hybrid baeline 5.8% Tied poterior 9.4% Tandem modeling - Dan Elli

10 3 Dicuion Key limitation: tak-pecific - NN i not like feature (it part of the trained ytem) Aurora999 wa a matched condition tak - ame noie added in training and tet - Aurora2 ha mimatched condition - Tandem modeling work jut a well How to relax pecificity? - train on alternative tak? - ue articulatory target Tandem modeling - Dan Elli

11 Future development How to optimize NN for thi tructure? - integrated training...previou work - HMM tate a target? Undertanding the gain - better analyi of each piece contribution - trength of different modeling approache - effect of model/training et ize variation - tied poterior? Other peech corpora - need both NN and GMM ytem... - Switchboard i next goal Tandem modeling - Dan Elli

Tandem modeling investigations

Tandem modeling investigations Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 What makes Tandem successful? Can we make Tandem better? Does Tandem