Oregon Graduate Institute of Science and Technology,

Size: px

Start display at page:

Download "Oregon Graduate Institute of Science and Technology,"

Ronald McBride
6 years ago
Views:

1 SPEAKER RECOGNITION AT OREGON GRADUATE INSTITUTE June & 6, 997 Sarel van Vuuren and Narendranath Malayath Hynek Hermansky and Pieter Vermeulen, Oregon Graduate Institute, Portland, Oregon

2 Oregon Graduate Institute. Speaker Recognition at OGI Research Group Goals. Competitive System Architecture Results 3. Initial Robust System Architecture Preliminary Results and Conclusions Planned Extensions

3 People { Faculty: Hynek Hermansky, Pieter Vermeulen { Post Doc: Nobu Kanedera, Carlos Avendano { PhD Students: Sarel van Vuuren, Sangita Tibrewala, Narendranath Malayath Speech processing by emulating relevant properties of speech perception Collaboration with { CSLU at OGI { ICSI Berkeley { IIT Madras { IDIAP Martigny { KTH Stockholm

4 Activities { Speaker identication { Acoustic modeling for ASR { Enhancement of degraded speech and speech processing for handicapped { Human speech perception

5 Speaker Recognition at OGI Speech Signal { linguistic message { speaker characteristics { environment Task { nd out how these information sources are coded into the signal Applications { speaker ID { speaker independent ASR { voice mimic

6 Requirements of a Speaker Verication System Invariant to channel Invariant to session Invariant to noise Minimal training data Minimal verication data Adapt to speaker styles

7 Goals Be familiar with state of the art { Build an up to date competitive system following the state of the art { Analyze and understand abilities and limitations { Contribute to research system { Incorporate ideas from research system

8 Goals Research novel ideas { Knowledge driven { Analyze and understand { Report results { Contribute to state of the art system { Incorporate knowledge from state of the art system Address robustness { Invariance vs modeling { Channels and noise Address data requirements { Training { Verication

9 Initial Robust System Preprocessing Similar Representation Rep. Rep. L+E Speaker Specific Mapping L+S+E + L+S +E - Distance Information Sources L:Linguistic S:Speaker E:Environment Frame Integration Features Likelihood Estimator Residue invariant to extraneous information and noise Preprocessing: segmentation - such as silence removal, voiced segments Representation: diering speaker information - such as low order PLP vs high order PLP Speaker Specic Mapping: - such as Neural Net or Pseudo Inverse

10 Initial Robust System Preprocessing Similar Representation Rep. Rep. L+E Speaker Specific Mapping L+S+E + L+S +E - Distance Information Sources L:Linguistic S:Speaker E:Environment Frame Integration Features Likelihood Estimator Speaker Specic Distance Measure: - Euclidean, likelihood estimator, Bhattacharyya Frame Integrator: - average, voting Likelihood Estimator Adding other information (pitch, formants)

11 Initial Robust Implementation PLP Representation Remove Silence PLP-7 PLP-4 Speaker Specific NN + - Euclidean Frame Average Preprocessing: Silence deletion Representation: PLP-7 vs PLP-4 Speaker Specic Mapping: Neural Net Distance Measure: Euclidean Frame Integrator: Average Likelihood Estimator: None

12 Preliminary Studies Map from speaker independent to speaker rich representation Evidence of discrimination Evidence of low data requirements for verication No handset robustness - mapping not invariant due to training methodology

13 Results { GMM baseline DET curve: handset training; 3 sec test ; female; training handset 0 mdcf hdcf 0.00 eer 9.80 % mdcf (.8,9.4) hdcf (.7,4.) mdcf 0.08 eer 73 0 DET curve: handset training; 3 sec test ; female; non training handset 0 mdcf hdcf eer 4.86 % mdcf (.9,4.) hdcf (.7,4.) mdcf eer

14 Results { GMM baseline DET curve: handset training; 0 sec test; female; training handset 0 mdcf 0.09 hdcf eer.04 % mdcf (.3,.7) hdcf (.,7.4) mdcf eer 07 0 DET curve: handset training; 0 sec test; female; non training handset 0 mdcf hdcf 0.00 eer 9.60 % mdcf (.,3.8) hdcf (.,37.8) mdcf eer 0.3 0

15 Results { GMM baseline DET curve: handset training; 30 sec test; female; training handset 0 mdcf 0.0 hdcf 0.06 eer.80 % mdcf (0.6,8.7) hdcf (0.7,9.0) mdcf 0.0 eer 7 0 DET curve: handset training; 30 sec test; female; non training handset 0 mdcf hdcf eer 6.9 % mdcf (.,9.) hdcf (0.7,30.0) mdcf 0.09 eer 94 0

16 Results { PLP system DET curve: handset training; 3 sec test ; female; training handset 0 mdcf eer 9.0 % mdcf (.9,0.0) mdcf 0.84 eer DET curve: handset training; 3 sec test ; female; non training handset 0 mdcf eer 33.3 % mdcf (0.8,0.0) mdcf eer

17 Results { PLP system DET curve: handset training; 0 sec test; female; training handset 0 mdcf 0.06 eer.69 % mdcf (.4,.9) mdcf 0.88 eer DET curve: handset training; 0 sec test; female; non training handset 0 mdcf 0.09 eer % mdcf (0.9,0.0) mdcf 0.87 eer

18 Results { PLP system DET curve: handset training; 30 sec test; female; training handset 0 mdcf eer 4.8 % mdcf (.6,4.6) mdcf 0.88 eer DET curve: handset training; 30 sec test; female; non training handset 0 mdcf eer 9.4 % mdcf (0.8,0.0) mdcf 0.83 eer

19 Results { Subspace system DET curve: handset training; 3 sec test ; female; training handset 0 mdcf 0.07 eer.4 % mdcf (.,0.0) mdcf eer DET curve: handset training; 3 sec test ; female; non training handset eer 3.8 % 0 0

20 Results { Subspace system DET curve: handset training; 0 sec test; female; training handset 0 mdcf eer.76 % mdcf (.,39.) mdcf eer DET curve: handset training; 0 sec test; female; non training handset eer % 0 0

21 Results { Subspace system DET curve: handset training; 30 sec test; female; training handset 0 mdcf 0.0 eer 9.4 % mdcf (.6,38.7) mdcf eer DET curve: handset training; 30 sec test; female; non training handset eer 9.4 % 0 0

22 Future Work: Speaker Verication Understand each component Preprocessing Representation Environment Invariant Mapping Distance Measure

Research Article Automatic Speaker Recognition for Mobile Forensic Applications

Hindawi Mobile Information Systems Volume 07, Article ID 698639, 6 pages https://doi.org//07/698639 Research Article Automatic Speaker Recognition for Mobile Forensic Applications Mohammed Algabri, Hassan