Mark John Francis Gales

Size: px
Start display at page:

Download "Mark John Francis Gales"

Transcription

1 Mark John Francis Gales Thesis, Book Chapters and Magazines 1. S. Furui, L. Deng, M. Gales, H. Ney, and K. Tokuda, Fundamental technologies in modern speech recognition [from the guest editors], IEEE Signal Processing Magazine, vol. 29, pp , November M. J. F. Gales, S. Watanabe, and E. Fosler-Lussier, Structured discriminative models for speech recognition: An overview, IEEE Signal Processing Magazine, vol. 29, pp , November S. Matsoukas, T. Ng, L. Nguyen, F. Diehl, M. J. F. Gales, X. Liu, P. C. Woodland, J.-L. Gauvain, L. Lamel, A. Messaoudi, and etal, Handbook of natural language processing and machine translation. DARPA Global Autonomous Language Exploitation, ch. Optimizing Speech-To-Text System Combination for Machine Translation. Springer, M. Tomalin, W. Byrne, A. de Gispert, M. Gales, X. Liu, P. Woodland, and etal, Handbook of natural language processing and machine translation. DARPA Global Autonomous Language Exploitation, ch. Improving Speech Transcription for Machine Translation. Springer, M. J. F. Gales, Robust Speech Recognition of Uncertain or Missing Data, ch. Model-Based Approaches to Handling Uncertainty. Springer, M. J. F. Gales, Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods, ch. Augmented Statistical Models: Using Dynamic Kernels for Acoustic Models M. J. F. Gales and S. J. Young, The application of hidden Markov models in speech recognition, Foundations and Trends in Signal Processing, vol. 1, no. 3, pp , S. J. Young, G. Evermann, M. J. F. Gales, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, December M. J. F. Gales, Model-Based Techniques for Noise Robust Speech Recognition. PhD thesis, Cambridge University Engineering Department, September

2 Journal Papers and Letters 1. Y. Wang, M. J. F. Gales, K. M. Knill, K. Kyriakopoulos, A. Malinin, R. C. van Dalen, and M. Rashid, Towards automatic assessment of spontaneous spoken English, Speech Communication, vol. 104, pp , G. Degottex, P. Lanchantin, and M. J. F. Gales, A log domain pulse model for parametric speech synthesis, IEEE/ACM Trans. Audio, Speech & Language Processing, vol. 26, no. 1, pp , C. Wu, M. J. F. Gales, A. Ragni, P. Karanasou, and K. C. Sim, Improving interpretability and regularization in deep learning, IEEE/ACM Trans. Audio, Speech & Language Processing, vol. 26, no. 2, pp , P. Karanasou, C. Wu, M. J. F. Gales, and P. C. Woodland, I-vectors and structured neural networks for rapid adaptation of acoustic models, IEEE/ACM Trans. Audio, Speech & Language Processing, vol. 25, no. 4, pp , X. Liu, X. Chen, Y. Wang, M. J. F. Gales, and P. C. Woodland, Two efficient lattice rescoring methods using recurrent neural network language models, IEEE/ACM Trans. Audio, Speech & Language Processing, vol. 24, no. 8, pp , X. Chen, X. Liu, Y. Wang, M. J. F. Gales, and P. C. Woodland, Efficient training and evaluation of recurrent neural network language models for automatic speech recognition, IEEE/ACM Trans. Audio, Speech & Language Processing, vol. 24, no. 11, pp , T. Yoshioka and M. J. F. Gales, Environmentally robust ASR front-end for deep neural network acoustic models, Computer Speech & Language, vol. 31, no. 1, pp , L. Chen, N. Braunschweiler, and M. J. F. Gales, Speaker and expression factorization for audiobook data: Expressiveness and transplantation, IEEE/ACM Trans. Audio, Speech & Language Processing, vol. 23, no. 4, pp , X. Liu, M. J. F. Gales, and P. C. Woodland, Paraphrastic language models, Computer Speech & Language, vol. 28, no. 6, pp , V. Wan, J. Latorre, K. Yanagisawa, N. Braunschweiler, L. Chen, M. J. F. Gales, and M. Akamine, Building HMM-TTS voices on diverse data, J. Sel. Topics Signal Processing, vol. 8, no. 2, pp , L. Chen, M. J. F. Gales, N. Braunschweiler, M. Akamine, and K. Knill, Integrated expression prediction and speech synthesis from text, J. Sel. Topics Signal Processing, vol. 8, no. 2, pp , X. Liu, M. J. F. Gales, and P. C. Woodland, Use of contexts in language model interpolation and adaptation, Computer Speech & Language, vol. 27, no. 1, pp , R. C. van Dalen and M. J. F. Gales, Importance sampling to compute likelihoods of noise-corrupted speech, Computer Speech & Language, vol. 27, no. 1, pp , X. Liu, M. J. F. Gales, and P. C. Woodland, Language model cross adaptation for LVCSR system combination, Computer Speech & Language, vol. 27, no. 4, pp , R. Maia, M. Akamine, and M. J. F. Gales, Complex cepstrum for statistical parametric speech synthesis, Speech Communication, vol. 55, no. 5, pp , S. Zhang and M. J. F. Gales, Structured SVMs for automatic speech recognition, IEEE Trans. Audio, Speech & Language Processing, vol. 21, no. 3, pp , F. Diehl, M. J. F. Gales, M. Tomalin, and P. C. Woodland, Morphological decomposition in Arabic ASR systems, Computer Speech & Language, vol. 26, no. 4, pp , H. Zen, M. J. F. Gales, Y. Nankaku, and K. Tokuda, Product of experts for statistical parametric speech synthesis, IEEE Trans. Audio, Speech & Language Processing, vol. 20, no. 3, pp , H. Zen, N. Braunschweiler, S. Buchholz, M. J. F. Gales, K. Knill, S. Krstulovic, and J. Latorre, Statistical parametric speech synthesis based on speaker and language factorization, IEEE Trans. Audio, Speech & Language Processing, vol. 20, no. 6, pp , Y. Wang and M. J. F. Gales, Speaker and noise factorization for robust speech recognition, IEEE Trans. Audio, Speech & Language Processing, vol. 20, no. 7, pp ,

3 21. J. Park, F. Diehl, M. J. F. Gales, M. Tomalin, and P. C. Woodland, The efficient incorporation of MLP features into automatic speech recognition systems, Computer Speech & Language, vol. 25, no. 3, pp , Z. Roupakia and M. J. F. Gales, Kernel eigenvoices (revisited) for large-vocabulary speech recognition, IEEE Signal Process. Lett., vol. 18, no. 12, pp , D. K. Kim and M. J. F. Gales, Noisy constrained maximum-likelihood linear regression for noiserobust speech recognition, IEEE Trans. Audio, Speech & Language Processing, vol. 19, no. 2, pp , R. C. van Dalen and M. J. F. Gales, Extended VTS for noise-robust speech recognition, IEEE Trans. Audio, Speech & Language Processing, vol. 19, no. 4, pp , H. Xu, M. J. F. Gales, and K. K. Chin, Joint uncertainty decoding with predictive methods for noise robust speech recognition, IEEE Trans. Audio, Speech & Language Processing, vol. 19, no. 6, pp , M. J. F. Gales and F. Flego, Discriminative classifiers with adaptive kernels for noise robust speech recognition, Computer Speech & Language, vol. 24, no. 4, pp , K. Yu, M. J. F. Gales, L. Wang, and P. C. Woodland, Unsupervised training and directed manual transcription for LVCSR, Speech Communication, vol. 52, no. 7-8, pp , S. Zhang, A. Ragni, and M. J. F. Gales, Structured log linear models for noise robust speech recognition, IEEE Signal Process. Lett., vol. 17, no. 11, pp , C. Breslin and M. J. F. Gales, Directed decision trees for generating complementary systems, Speech Communication, vol. 51, no. 3, pp , K. Yu, M. J. F. Gales, and P. C. Woodland, Unsupervised adaptation with discriminative mapping transforms, IEEE Trans. Audio, Speech & Language Processing, vol. 17, no. 4, pp , C. Longworth and M. J. F. Gales, Combining derivative and parametric kernels for speaker verification, IEEE Trans. Audio, Speech & Language Processing, vol. 17, no. 4, pp , H. Liao and M. J. F. Gales, Issues with uncertainty decoding for noise robust automatic speech recognition, Speech Communication, vol. 50, no. 4, pp , K. C. Sim and M. J. F. Gales, Discriminative semi-parametric trajectory model for speech recognition, Computer Speech & Language, vol. 21, no. 4, pp , X. Liu and M. J. F. Gales, Automatic model complexity control using marginalized discriminative growth functions, IEEE Trans. Audio, Speech & Language Processing, vol. 15, no. 4, pp , K. Yu and M. J. F. Gales, Bayesian adaptive inference and adaptive training, IEEE Trans. Audio, Speech & Language Processing, vol. 15, no. 6, pp , M. I. Layton and M. J. F. Gales, Acoustic modelling using continuous rational kernels, VLSI Signal Processing, vol. 48, no. 1-2, pp , M. J. F. Gales and S. S. Airey, Product of Gaussians for speech recognition, Computer Speech & Language, vol. 20, no. 1, pp , M. J. F. Gales and M. I. Layton, Training augmented models using SVMs, IEICE Transactions, vol. 89-D, no. 3, pp , K. C. Sim and M. J. F. Gales, Minimum phone error training of precision matrix models, IEEE Trans. Audio, Speech & Language Processing, vol. 14, no. 3, pp , M. J. F. Gales, D. Y. Kim, P. C. Woodland, H. Y. Chan, D. Mrva, R. Sinha, and S. E. Tranter, Progress in the CU-HTK broadcast news transcription system, IEEE Trans. Audio, Speech & Language Processing, vol. 14, no. 5, pp , K. Yu and M. J. F. Gales, Discriminative cluster adaptive training, IEEE Trans. Audio, Speech & Language Processing, vol. 14, no. 5, pp , T. Hain, P. C. Woodland, G. Evermann, M. J. F. Gales, X. Liu, G. L. Moore, D. Povey, and L. Wang, Automatic transcription of conversational telephone speech, IEEE Trans. Speech and Audio Processing, vol. 13, no. 6, pp , A. I. Rosti and M. J. F. Gales, Factor analysed hidden Markov models for speech recognition, Computer Speech & Language, vol. 18, no. 2, pp ,

4 44. M. J. F. Gales, Transformation streams and the HMM error model, Computer Speech & Language, vol. 16, no. 2, pp , S. S. Chen, E. Eide, M. J. F. Gales, R. A. Gopinath, D. Kanvesky, and P. A. Olsen, Automatic transcription of broadcast news, Speech Communication, vol. 37, no. 1-2, pp , M. J. F. Gales, Maximum likelihood multiple subspace projections for hidden Markov models, IEEE Trans. Speech and Audio Processing, vol. 10, no. 2, pp , M. J. F. Gales, Cluster adaptive training of hidden Markov models, IEEE Trans. Speech and Audio Processing, vol. 8, no. 4, pp , M. J. F. Gales, K. Knill, and S. J. Young, State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs, IEEE Trans. Speech and Audio Processing, vol. 7, no. 2, pp , M. J. F. Gales, Semi-tied covariance matrices for hidden Markov models, IEEE Trans. Speech and Audio Processing, vol. 7, no. 3, pp , M. J. F. Gales, Maximum likelihood linear transformations for HMM-based speech recognition, Computer Speech & Language, vol. 12, no. 2, pp , M. J. F. Gales, Predictive model-based compensation schemes for robust speech recognition, Speech Communication, vol. 25, no. 1-3, pp , M. J. F. Gales and P. C. Woodland, Mean and variance adaptation within the MLLR framework, Computer Speech & Language, vol. 10, no. 4, pp , M. J. F. Gales and S. J. Young, Robust continuous speech recognition using parallel model combination, IEEE Trans. Speech and Audio Processing, vol. 4, no. 5, pp , M. J. F. Gales and S. J. Young, Robust speech recognition in additive and convolutional noise using parallel model combination, Computer Speech & Language, vol. 9, no. 4, pp , M. J. F. Gales and S. J. Young, Cepstral parameter compensation for HMM recognition in noise, Speech Communication, vol. 12, no. 3, pp ,

5 Conference and Workshop Papers 1. Y. Wang, X. Chen, M. J. F. Gales, A. Ragni, and J. H. M. Wong, Phonetic and graphemic systems for multi-genre broadcast transcription, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018, pp , Y. Wang, C. Zhang, M. J. F. Gales, and P. C. Woodland, Speaker adaptation and adaptive training for jointly optimised tandem systems, in Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018., pp , M. Wan, G. Degottex, and M. J. F. Gales, Waveform-based speaker representations for speech synthesis, in Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018., pp , K. Kyriakopoulos, K. Knill, and M. J. F. Gales, A deep learning approach to assessing non-native pronunciation of English using phone distances, in Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018., pp , K. Knill, M. J. F. Gales, K. Kyriakopoulos, A. Malinin, A. Ragni, Y. Wang, and A. Caines, Impact of ASR performance on free speaking language assessment, in Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018., pp , A. Ragni and M. J. F. Gales, Automatic speech recognition system development in the wild, in Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018., pp , O. Chen, A. Ragni, M. J. F. Gales, and X. Chen, Active memory networks for language modeling, in Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018., pp , A. Malinin and M. J. F. Gales, Predictive uncertainty estimation via prior networks, in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pp , A. Malinin, A. Ragni, K. Knill, and M. J. F. Gales, Incorporating uncertainty into deep learning for spoken language assessment, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 2: Short Papers, pp , J. H. M. Wong and M. J. F. Gales, Multi-task ensembles with teacher-student training, in 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017, pp , X. Chen, X. Liu, A. Ragni, Y. Wang, and M. J. F. Gales, Future word contexts in neural network language models, in 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017, pp , A. Malinin, K. Knill, and M. J. F. Gales, A hierarchical attention based model for off-topic spontaneous spoken response detection, in 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017, pp , M. Wan, G. Degottex, and M. J. F. Gales, Integrated speaker-adaptive speech synthesis, in 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017, pp , A. Ragni, C. Wu, M. J. F. Gales, J. Vasilakes, and K. M. Knill, Stimulated training for automatic speech recognition and keyword search in limited resource conditions, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, pp , A. Ragni, D. Saunders, P. Zahemszky, J. Vasilakes, M. J. F. Gales, and K. M. Knill, Morph-toword transduction for accurate and efficient automatic speech recognition and keyword search, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, pp ,

6 16. X. Chen, A. Ragni, J. Vasilakes, X. Liu, K. Knill, and M. J. F. Gales, Recurrent neural network language models for keyword search, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, pp , J. H. M. Wong and M. J. F. Gales, Student-teacher training with diverse decision tree ensembles, in Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, pp , X. Chen, A. Ragni, X. Liu, and M. J. F. Gales, Investigating bidirectional recurrent neural network language models for speech recognition, in Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, pp , C. Wu and M. J. F. Gales, Deep activation mixture model for speech recognition, in Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, pp , K. M. Knill, M. J. F. Gales, K. Kyriakopoulos, A. Ragni, and Y. Wang, Use of graphemic lexicons for spoken language assessment, in Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, pp , K. Kyriakopoulos, M. J. F. Gales, and K. Knill, Automatic characterisation of the pronunciation of non-native English speakers using phone distance features, in 7th ISCA International Workshop on Speech and Language Technology in Education, SLaTE 2017, Stockholm, Sweden, August 2017, pp , A. Malinin, K. Knill, A. Ragni, Y. Wang, and M. J. F. Gales, An attention based model for offtopic spontaneous spoken response detection: An initial study, in 7th ISCA International Workshop on Speech and Language Technology in Education, SLaTE 2017, Stockholm, Sweden, August 2017, pp , M. J. F. Gales, K. M. Knill, and A. Ragni, Low-resource speech recognition and keyword-spotting, in Speech and Computer - 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings, pp. 3 19, A. Malinin, R. C. van Dalen, K. Knill, Y. Wang, and M. J. F. Gales, Off-topic response detection for spontaneous spoken English assessment, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, C. Wu, P. Karanasou, and M. J. F. Gales, Combining i-vector representation and structured neural networks for rapid adaptation, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pp , J. Yang, C. Zhang, A. Ragni, M. J. F. Gales, and P. C. Woodland, System combination with loglinear models, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pp , L. Wang, C. Zhang, P. C. Woodland, M. J. F. Gales, P. Karanasou, P. Lanchantin, X. Liu, and Y. Qian, Improved DNN-based segmentation for multi-genre broadcast audio, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pp , X. Chen, X. Liu, Y. Qian, M. J. F. Gales, and P. C. Woodland, CUED-RNNLM - an opensource toolkit for efficient training and evaluation of recurrent neural network language models, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pp , C. Wu, P. Karanasou, M. J. F. Gales, and K. C. Sim, Stimulated deep neural network for speech recognition, in Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pp , J. Yang, A. Ragni, M. J. F. Gales, and K. M. Knill, Log-linear system combination using structured support vector machines, in Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pp ,

7 31. S. Kundu, K. C. Sim, and M. J. F. Gales, Incorporating a generative front-end layer to deep neural network for noise robust automatic speech recognition, in Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pp , J. H. M. Wong and M. J. F. Gales, Sequence student-teacher training of deep neural networks, in Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pp , A. Ragni, E. Dakin, X. Chen, M. J. F. Gales, and K. M. Knill, Multi-language neural network language models, in Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pp , P. Lanchantin, M. J. F. Gales, P. Karanasou, X. Liu, Y. Qian, L. Wang, P. C. Woodland, and C. Zhang, Selection of multi-genre broadcast data for the training of automatic speech recognition systems, in Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pp , D. J. Litman, S. J. Young, M. J. F. Gales, K. Knill, K. Ottewell, R. C. van Dalen, and D. Vandyke, Towards using conversations with spoken dialogue systems in the automated assessment of nonnative speakers of English, in Proceedings of the SIGDIAL 2016 Conference, The 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, September 2016, Los Angeles, CA, USA, pp , G. Degottex, P. Lanchantin, and M. J. F. Gales, A pulse model in log-domain for a uniform synthesizer, in The 9th ISCA Speech Synthesis Workshop, Sunnyvale, CA, USA, September 2016, pp , R. C. van Dalen, J. Yang, H. Wang, A. Ragni, C. Zhang, and M. J. F. Gales, Structured discriminative models using deep neural-network features, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pp , X. Chen, X. Liu, M. J. F. Gales, and P. C. Woodland, Investigation of back-off based interpolation between recurrent neural network and n-gram language models, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pp , J. Cui, B. Kingsbury, B. Ramabhadran, A. Sethy, K. Audhkhasi, X. Cui, E. Kislal, L. Mangu, M. Nußbaum-Thom, M. Picheny, Z. Tüske, P. Golik, R. Schlüter, H. Ney, M. J. F. Gales, K. M. Knill, A. Ragni, H. Wang, and P. C. Woodland, Multilingual representations for low resource speech recognition and keyword search, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pp , S. Tan, K. C. Sim, and M. J. F. Gales, Improving the interpretability of deep neural networks with stimulated learning, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pp , P. C. Woodland, X. Liu, Y. Qian, C. Zhang, M. J. F. Gales, P. Karanasou, P. Lanchantin, and L. Wang, Cambridge University transcription systems for the multi-genre broadcast challenge, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pp , P. Lanchantin, M. J. F. Gales, P. Karanasou, X. Liu, Y. Qian, L. Wang, P. C. Woodland, and C. Zhang, The development of the Cambridge University alignment systems for the multi-genre broadcast challenge, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pp , P. Karanasou, M. J. F. Gales, P. Lanchantin, X. Liu, Y. Qian, L. Wang, P. C. Woodland, and C. Zhang, Speaker diarisation and longitudinal linking in multi-genre broadcast data, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pp , P. Bell, M. J. F. Gales, T. Hain, J. Kilgour, P. Lanchantin, X. Liu, A. McParland, S. Renals, O. Saz, M. Wester, and P. C. Woodland, The MGB challenge: Evaluating multi-genre broadcast media recognition, in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015, pp ,

8 45. C. Wu and M. J. F. Gales, Multi-basis adaptive neural network for rapid adaptation in speech recognition, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pp , A. Ragni, M. J. F. Gales, and K. M. Knill, A language space representation for speech recognition, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pp , T. Drugman, Y. Stylianou, L. Chen, X. Chen, and M. J. F. Gales, Robust excitation-based features for automatic speech recognition, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pp , R. C. van Dalen, K. M. Knill, P. Tsiakoulis, and M. J. F. Gales, Improving multiple-crowd-sourced transcriptions using a speech recogniser, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pp , M. J. F. Gales, K. M. Knill, and A. Ragni, Unicode-based graphemic systems for limited resource languages, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pp , X. Chen, X. Liu, M. J. F. Gales, and P. C. Woodland, Improving the training and evaluation efficiency of recurrent neural network language models, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pp , X. Liu, X. Chen, M. J. F. Gales, and P. C. Woodland, Paraphrastic recurrent neural network language models, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pp , X. Chen, X. Liu, M. J. F. Gales, and P. C. Woodland, Recurrent neural network language model training with noise contrastive estimation for speech recognition, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pp , G. Mendels, E. Cooper, V. Soto, J. Hirschberg, M. J. F. Gales, K. M. Knill, A. Ragni, and H. Wang, Improving speech recognition and keyword search for low resource languages using web data, in INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, pp , P. Lanchantin, C. Veaux, M. J. F. Gales, S. King, and J. Yamagishi, Reconstructing voices within the multiple-average-voice-model framework, in INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, pp , R. C. van Dalen and M. J. F. Gales, Annotating large lattices with the exact word error, in INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, pp , P. Karanasou, M. J. F. Gales, and P. C. Woodland, I-vector estimation using informative priors for adaptation of deep neural networks, in INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, pp , X. Liu, F. Flego, L. Wang, C. Zhang, M. J. F. Gales, and P. C. Woodland, The Cambridge University 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation, in INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, pp , X. Chen, T. Tan, X. Liu, P. Lanchantin, M. Wan, M. J. F. Gales, and P. C. Woodland, Recurrent neural network language model adaptation for multi-genre broadcast speech recognition, in INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, pp , H. Wang, A. Ragni, M. J. F. Gales, K. M. Knill, P. C. Woodland, and C. Zhang, Joint decoding of tandem and hybrid systems for improved keyword spotting on low resource languages, in INTER- SPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, pp ,

9 60. R. C. van Dalen, K. M. Knill, and M. J. F. Gales, Automatically grading learners English using a Gaussian process, in ISCA International Workshop on Speech and Language Technology in Education, SLaTE 2015, Leipzig, Germany, September 4-5, 2015, pp. 7 12, V. Wan, J. Latorre, K. Yanagisawa, M. J. F. Gales, and Y. Stylianou, Cluster adaptive training of average voice models, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, pp , P. Lanchantin, M. J. F. Gales, S. King, and J. Yamagishi, Multiple-average-voice-based speech synthesis, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, pp , L. Chen, N. Braunschweiler, and M. J. F. Gales, Speaker dependent expression predictor from text: Expressiveness and transplantation, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, pp , J. Yang, R. C. van Dalen, S. Zhang, and M. J. F. Gales, Infinite structured support vector machines for speech recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, pp , X. Liu, M. J. F. Gales, and P. C. Woodland, Paraphrastic neural network language models, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, pp , X. Liu, Y. Wang, X. Chen, M. J. F. Gales, and P. C. Woodland, Efficient lattice rescoring using recurrent neural network language models, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, pp , T. Yoshioka, X. Chen, and M. J. F. Gales, Impact of single-microphone dereverberation on DNNbased meeting transcription systems, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, pp , T. Yoshioka, A. Ragni, and M. J. F. Gales, Investigation of unsupervised adaptation of DNN acoustic models with filter bank input, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014, pp , K. Knill, M. J. F. Gales, A. Ragni, and S. P. Rath, Language independent and unsupervised acoustic models for speech recognition and keyword spotting, in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, pp , X. Chen, Y. Wang, X. Liu, M. J. F. Gales, and P. C. Woodland, Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch, in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, pp , A. Ragni, K. M. Knill, S. P. Rath, and M. J. F. Gales, Data augmentation for low resource languages, in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, pp , S. P. Rath, K. M. Knill, A. Ragni, and M. J. F. Gales, Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages, in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, pp , X. Chen, M. J. F. Gales, K. M. Knill, C. Breslin, L. Chen, K. K. Chin, and V. Wan, An initial investigation of long-term adaptation for meeting transcription, in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, pp , B. Kolluru, V. Wan, J. Latorre, K. Yanagisawa, and M. J. F. Gales, Generating multiple-accent pronunciations for TTS using joint sequence model interpolation, in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, pp , K. Yanagisawa, L. Chen, and M. J. F. Gales, Noise-robust TTS speaker adaptation with statistics smoothing, in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, pp ,

10 76. P. Karanasou, Y. Wang, M. J. F. Gales, and P. C. Woodland, Adaptation of deep neural network acoustic models using factorised i-vectors, in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, pp , J. Latorre, K. Yanagisawa, V. Wan, B. Kolluru, and M. J. F. Gales, Speech intonation for TTS: study on evaluation methodology, in INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014, pp , M. J. F. Gales, K. M. Knill, A. Ragni, and S. P. Rath, Speech recognition and keyword spotting for low-resource languages: Babel project research at CUED, in 4th Workshop on Spoken Language Technologies for Under-resourced Languages, SLTU 2014, St. Petersburg, Russia, May 14-16, 2014, pp , K. Knill, M. J. F. Gales, S. P. Rath, P. C. Woodland, C. Zhang, and S. Zhang, Investigation of multilingual deep neural networks for spoken term detection, in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, December 8-12, 2013, pp , J. Latorre, M. J. F. Gales, K. Knill, and M. Akamine, Training a supra-segmental parametric F0 model without interpolating F0, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , S. Zhang and M. J. F. Gales, Kernelized log linear models for continuous speech recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , R. C. van Dalen, A. Ragni, and M. J. F. Gales, Efficient decoding with generative score-spaces using the expectation semiring, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , Y. Wang and M. J. F. Gales, Tandem system adaptation using multiple linear feature transforms, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , R. Maia, M. Akamine, and M. J. F. Gales, Complex cepstrum analysis based on the minimum mean squared error, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , L. Chen, M. J. F. Gales, N. Braunschweiler, M. Akamine, and K. Knill, Integrated automatic expression prediction and speech synthesis from text, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , J. Mamou, J. Cui, X. Cui, M. J. F. Gales, B. Kingsbury, K. Knill, L. Mangu, D. Nolden, M. Picheny, B. Ramabhadran, R. Schlüter, A. Sethy, and P. C. Woodland, System combination and score normalization for spoken term detection, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , B. Kingsbury, J. Cui, X. Cui, M. J. F. Gales, K. Knill, J. Mamou, L. Mangu, D. Nolden, M. Picheny, B. Ramabhadran, R. Schlüter, A. Sethy, and P. C. Woodland, A high-performance Cantonese keyword search system, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , X. Liu, M. J. F. Gales, and P. C. Woodland, Paraphrastic language models and combination with neural network language models, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , M. S. Seigel, P. C. Woodland, and M. J. F. Gales, A confidence-based approach for improving keyword hypothesis scores, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013, pp , P. Lanchantin, P. Bell, M. J. F. Gales, T. Hain, X. Liu, Y. Long, J. Quinnell, S. Renals, O. Saz, M. S. Seigel, P. Swietojanski, and P. C. Woodland, Automatic transcription of multi-genre media archives, in Proceedings of the First Workshop on Speech, Language and Audio in Multimedia, Marseille, France, August 22-23, 2013, pp ,

11 91. Y. Wang and M. J. F. Gales, An explicit independence constraint for factorised adaptation in speech recognition, in INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, pp , Y. Long, M. J. F. Gales, P. Lanchantin, X. Liu, M. S. Seigel, and P. C. Woodland, Improving lightly supervised training for broadcast transcription, in INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, pp , R. Maia, M. J. F. Gales, Y. Stylianou, and M. Akamine, Minimum mean squared error based warped complex cepstrum analysis for statistical parametric speech synthesis, in INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, pp , V. Wan, R. Anderson, A. Blokland, N. Braunschweiler, L. Chen, B. Kolluru, J. Latorre, R. Maia, B. Stenger, K. Yanagisawa, Y. Stylianou, M. Akamine, M. J. F. Gales, and R. Cipolla, Photorealistic expressive text to talking head synthesis, in INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, pp , J. Yang, R. C. van Dalen, and M. J. F. Gales, Infinite support vector machines in speech recognition, in INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, pp , X. Liu, M. J. F. Gales, and P. C. Woodland, Cross-domain paraphrasing for improving language modelling using out-of-domain data, in INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013, pp , K. Yanagisawa, J. Latorre, V. Wan, M. J. F. Gales, and S. King, Noise robustness in HMM-TTS speaker adaptation, in The Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, Barcelona, Spain, August 31-September 2, 2013, pp , F. Eyben, S. Buchholz, N. Braunschweiler, J. Latorre, V. Wan, M. J. F. Gales, and K. Knill, Unsupervised clustering of emotion and voice styles for expressive TTS, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012, pp , A. Ragni and M. J. F. Gales, Inference algorithms for generative score-spaces, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012, pp , R. Maia, M. Akamine, and M. J. F. Gales, Complex cepstrum as phase information in statistical parametric speech synthesis, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012, pp , F. Flego and M. J. F. Gales, Factor analysis based VTS discriminative adaptive training, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012, pp , L. Chen, M. J. F. Gales, V. Wan, J. Latorre, and M. Akamine, Exploring rich expressive information from audiobook data using cluster adaptive training, in INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp , J. Latorre, V. Wan, M. J. F. Gales, L. Chen, K. K. Chin, K. Knill, and M. Akamine, Speech factorization for HMM-TTS based on cluster adaptive training, in INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp , V. Wan, J. Latorre, K. K. Chin, L. Chen, M. J. F. Gales, H. Zen, K. Knill, and M. Akamine, Combining multiple high quality corpora for improving HMM-TTS, in INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp , Y. Wang and M. J. F. Gales, Model-based approaches to adaptive training in reverberant environments, in INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp ,

12 106. M. J. F. Gales and F. Flego, Model-based approaches for degraded channel modelling in robust ASR, in INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp , X. Liu, M. J. F. Gales, and P. C. Woodland, Paraphrastic language models, in INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp , Z. Roupakia, A. Ragni, and M. J. F. Gales, Rapid nonlinear speaker adaptation for large-vocabulary continuous speech recognition, in INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012, pp , M. J. F. Gales, A. Ragni, A. Zhang, and R. C. van Dalen, Structured discriminative models for speech recognition, in 2012 Symposium on Machine Learning in Speech and Language Processing, MLSLP 2012, Portland, Oregon, USA, September 14, 2012, P. Bell, M. J. F. Gales, P. Lanchantin, X. Liu, Y. Long, S. Renals, P. Swietojanski, and P. C. Woodland, Transcription of multi-genre media archives using out-of-domain data, in 2012 IEEE Spoken Language Technology Workshop (SLT), Miami, FL, USA, December 2-5, 2012, pp , S. Zhang and M. J. F. Gales, Extending noise robust structured support vector machines to larger vocabulary tasks, in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, December 11-15, 2011, pp , Y. Wang and M. J. F. Gales, Improving reverberant VTS for hands-free robust speech recognition, in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, December 11-15, 2011, pp , A. Ragni and M. J. F. Gales, Derivative kernels for noise robust ASR, in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, December 11-15, 2011, pp , R. C. van Dalen and M. J. F. Gales, A variational perspective on noise-robust speech recognition, in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, December 11-15, 2011, pp , H. Zen and M. J. F. Gales, Decision tree-based context clustering based on cross validation and hierarchical priors, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic, pp , Y. Wang and M. J. F. Gales, Speaker and noise factorisation on the AURORA4 task, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic, pp , J. Latorre, M. J. F. Gales, S. Buchholz, K. Knill, M. Tamura, Y. Ohtani, and M. Akamine, Continuous F0 in the source-excitation generation for HMM-based TTS: do we need voiced/unvoiced classification?, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic, pp , A. Ragni and M. J. F. Gales, Structured discriminative models for noise robust continuous speech recognition, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic, pp , F. Flego and M. J. F. Gales, Factor analysis based VTS and JUD noise estimation and compensation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic, pp , X. Liu, M. J. F. Gales, J. L. Hieronymus, and P. C. Woodland, Investigation of acoustic units for LVCSR systems, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic, pp ,

13 121. L. Chen, M. J. F. Gales, and K. K. Chin, Constrained discriminative mapping transforms for unsupervised speaker adaptation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic, pp , K. K. Chin, H. Xu, M. J. F. Gales, C. Breslin, and K. Knill, Rapid joint speaker and noise compensation for robust speech recognition, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, May 22-27, 2011, Prague Congress Center, Prague, Czech Republic, pp , F. Diehl, M. J. F. Gales, X. Liu, M. Tomalin, and P. C. Woodland, Word boundary modelling and full covariance Gaussians for Arabic speech-to-text systems, in INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011, pp , S. Zhang and M. J. F. Gales, Structured support vector machines for noise robust continuous speech recognition, in INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011, pp , C. Breslin, K. K. Chin, M. J. F. Gales, and K. Knill, Integrated online speaker clustering and adaptation, in INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011, pp , R. Maia, H. Zen, K. Knill, M. J. F. Gales, and S. Buchholz, Multipulse sequences for residual signal modeling, in INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011, pp , T. Li, P. C. Woodland, F. Diehl, and M. J. F. Gales, Graphone model interpolation and Arabic pronunciation generation, in INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011, pp , N. Pilkington, H. Zen, and M. J. F. Gales, Gaussian process experts for voice conversion, in INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011, pp , X. Liu, M. J. F. Gales, and P. C. Woodland, Improving LVCSR system combination using neural network language model cross adaptation, in INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011, pp , H. Zen, M. J. F. Gales, Y. Nankaku, and K. Tokuda, Statistical parametric speech synthesis based on product of experts, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA, pp , M. Tomalin, F. Diehl, M. J. F. Gales, J. Park, and P. C. Woodland, Recent improvements to the cambridge Arabic speech-to-text systems, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA, pp , X. Liu, M. J. F. Gales, J. L. Hieronymus, and P. C. Woodland, Language model combination and adaptation usingweighted finite state transducers, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA, pp , M. J. F. Gales and K. Yu, Canonical state models for automatic speech recognition, in INTER- SPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pp , X. Liu, M. J. F. Gales, and P. C. Woodland, Language model cross adaptation for LVCSR system combination, in INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pp , R. C. van Dalen and M. J. F. Gales, Asymptotically exact noise-corrupted speech likelihoods, in INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pp ,

SRIRAM GANAPATHY. Indian Institute of Science, Phone: +91-(80) Bangalore, India, Fax: +91-(80)

SRIRAM GANAPATHY. Indian Institute of Science, Phone: +91-(80) Bangalore, India, Fax: +91-(80) SRIRAM GANAPATHY Assistant Professor, Email: sriram@ee.iisc.ernet.in Electrical Engineering, Web: http://www.leap.ee.iisc.ac.in/sriram Indian Institute of Science, Phone: +91-(80)-2293-2433 Bangalore,

More information

Noise-Robust Speech Recognition Technologies in Mobile Environments

Noise-Robust Speech Recognition Technologies in Mobile Environments Noise-Robust Speech Recognition echnologies in Mobile Environments Mobile environments are highly influenced by ambient noise, which may cause a significant deterioration of speech recognition performance.

More information

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS Mitchel Weintraub and Leonardo Neumeyer SRI International Speech Research and Technology Program Menlo Park, CA, 94025 USA ABSTRACT

More information

Enhanced Feature Extraction for Speech Detection in Media Audio

Enhanced Feature Extraction for Speech Detection in Media Audio INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Enhanced Feature Extraction for Speech Detection in Media Audio Inseon Jang 1, ChungHyun Ahn 1, Jeongil Seo 1, Younseon Jang 2 1 Media Research Division,

More information

THE MGB-2 CHALLENGE: ARABIC MULTI-DIALECT BROADCAST MEDIA RECOGNITION

THE MGB-2 CHALLENGE: ARABIC MULTI-DIALECT BROADCAST MEDIA RECOGNITION THE MGB-2 CHALLENGE: ARABIC MULTI-DIALECT BROADCAST MEDIA RECOGNITION Ahmed Ali 1,2, Peter Bell 2, James Glass 3, Yacine Messaoui 4, Hamdy Mubarak 1, Steve Renals 2, Yifan Zhang 1 1 Qatar Computing Research

More information

Online Speaker Adaptation of an Acoustic Model using Face Recognition

Online Speaker Adaptation of an Acoustic Model using Face Recognition Online Speaker Adaptation of an Acoustic Model using Face Recognition Pavel Campr 1, Aleš Pražák 2, Josef V. Psutka 2, and Josef Psutka 2 1 Center for Machine Perception, Department of Cybernetics, Faculty

More information

VIRTUAL ASSISTANT FOR DEAF AND DUMB

VIRTUAL ASSISTANT FOR DEAF AND DUMB VIRTUAL ASSISTANT FOR DEAF AND DUMB Sumanti Saini, Shivani Chaudhry, Sapna, 1,2,3 Final year student, CSE, IMSEC Ghaziabad, U.P., India ABSTRACT For the Deaf and Dumb community, the use of Information

More information

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions 3 Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions Ryoichi Takashima, Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University,

More information

HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION

HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION 2 th May 216. Vol.87. No.2 25-216 JATIT & LLS. All rights reserved. HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION 1 IBRAHIM MISSAOUI, 2 ZIED LACHIRI National Engineering

More information

Gender Based Emotion Recognition using Speech Signals: A Review

Gender Based Emotion Recognition using Speech Signals: A Review 50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department

More information

SRIRAM GANAPATHY. Indian Institute of Science, Phone: +91-(80) Bangalore, India, Fax: +91-(80)

SRIRAM GANAPATHY. Indian Institute of Science, Phone: +91-(80) Bangalore, India, Fax: +91-(80) SRIRAM GANAPATHY Assistant Professor, Email: sriramg@iisc.ac.in Electrical Engineering, Web: http://www.leap.ee.iisc.ac.in/sriram Indian Institute of Science, Phone: +91-(80)-2293-2433 Bangalore, India,

More information

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation Martin Graciarena 1, Luciana Ferrer 2, Vikramjit Mitra 1 1 SRI International,

More information

Acoustic Signal Processing Based on Deep Neural Networks

Acoustic Signal Processing Based on Deep Neural Networks Acoustic Signal Processing Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu, Yanhui Tu, Qing Wang, Tian Gao, Jun Du, LiRong Dai Outline

More information

A PUBLIC DOMAIN SPEECH-TO-TEXT SYSTEM

A PUBLIC DOMAIN SPEECH-TO-TEXT SYSTEM A PUBLIC DOMAIN SPEECH-TO-TEXT SYSTEM M. Ordowski+, N. Deshmukh*, A. Ganapathiraju*, J. Hamaker*, and J. Picone* *Institute for Signal and Information Processing Department for Electrical and Computer

More information

Full Utilization of Closed-captions in Broadcast News Recognition

Full Utilization of Closed-captions in Broadcast News Recognition Full Utilization of Closed-captions in Broadcast News Recognition Meng Meng, Wang Shijin, Liang Jiaen, Ding Peng, Xu Bo. Hi-Tech Innovation Center, Institute of Automation Chinese Academy of Sciences,

More information

Lecturer: T. J. Hazen. Handling variability in acoustic conditions. Computing and applying confidence scores

Lecturer: T. J. Hazen. Handling variability in acoustic conditions. Computing and applying confidence scores Lecture # 20 Session 2003 Noise Robustness and Confidence Scoring Lecturer: T. J. Hazen Handling variability in acoustic conditions Channel compensation Background noise compensation Foreground noises

More information

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment SBIR 99.1 TOPIC AF99-1Q3 PHASE I SUMMARY

More information

Adaptation of Classification Model for Improving Speech Intelligibility in Noise

Adaptation of Classification Model for Improving Speech Intelligibility in Noise 1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise) (Regular Paper) 23 4, 2018 7 (JBE Vol. 23, No. 4, July 2018) https://doi.org/10.5909/jbe.2018.23.4.511

More information

Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop

Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop Broadband Wireless Access and Applications Center (BWAC) CUA Site Planning Workshop Lin-Ching Chang Department of Electrical Engineering and Computer Science School of Engineering Work Experience 09/12-present,

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

Speech Enhancement Using Deep Neural Network

Speech Enhancement Using Deep Neural Network Speech Enhancement Using Deep Neural Network Pallavi D. Bhamre 1, Hemangi H. Kulkarni 2 1 Post-graduate Student, Department of Electronics and Telecommunication, R. H. Sapat College of Engineering, Management

More information

Speech recognition in noisy environments: A survey

Speech recognition in noisy environments: A survey T-61.182 Robustness in Language and Speech Processing Speech recognition in noisy environments: A survey Yifan Gong presented by Tapani Raiko Feb 20, 2003 About the Paper Article published in Speech Communication

More information

Speech Enhancement Based on Deep Neural Networks

Speech Enhancement Based on Deep Neural Networks Speech Enhancement Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu and Jun Du at USTC 1 Outline and Talk Agenda In Signal Processing Letter,

More information

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice PAGE 65 Performance of Gaussian Mixture Models as a Classifier for Pathological Voice Jianglin Wang, Cheolwoo Jo SASPL, School of Mechatronics Changwon ational University Changwon, Gyeongnam 64-773, Republic

More information

Full text available at: The Application of Hidden Markov Models in Speech Recognition

Full text available at:  The Application of Hidden Markov Models in Speech Recognition The Application of Hidden Markov Models in Speech Recognition The Application of Hidden Markov Models in Speech Recognition Mark Gales Cambridge University Engineering Department Cambridge CB2 1PZ UK mjfg@eng.cam.ac.uk

More information

Using Source Models in Speech Separation

Using Source Models in Speech Separation Using Source Models in Speech Separation Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION

COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION Journal of Engineering Science and Technology Vol. 11, No. 9 (2016) 1221-1233 School of Engineering, Taylor s University COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION

More information

Hybrid HMM and HCRF model for sequence classification

Hybrid HMM and HCRF model for sequence classification Hybrid HMM and HCRF model for sequence classification Y. Soullard and T. Artières University Pierre and Marie Curie - LIP6 4 place Jussieu 75005 Paris - France Abstract. We propose a hybrid model combining

More information

The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments

The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments Ram H. Woo, Alex Park, and Timothy J. Hazen MIT Computer Science and Artificial Intelligence Laboratory 32

More information

Recognition & Organization of Speech & Audio

Recognition & Organization of Speech & Audio Recognition & Organization of Speech & Audio Dan Ellis http://labrosa.ee.columbia.edu/ Outline 1 2 3 Introducing Projects in speech, music & audio Summary overview - Dan Ellis 21-9-28-1 1 Sound organization

More information

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan Emotion

More information

A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition

A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition LászlóTóth and Tamás Grósz MTA-SZTE Research Group on Artificial Intelligence Hungarian Academy of Sciences

More information

General Soundtrack Analysis

General Soundtrack Analysis General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

Speech as HCI. HCI Lecture 11. Human Communication by Speech. Speech as HCI(cont. 2) Guest lecture: Speech Interfaces

Speech as HCI. HCI Lecture 11. Human Communication by Speech. Speech as HCI(cont. 2) Guest lecture: Speech Interfaces HCI Lecture 11 Guest lecture: Speech Interfaces Hiroshi Shimodaira Institute for Communicating and Collaborative Systems (ICCS) Centre for Speech Technology Research (CSTR) http://www.cstr.ed.ac.uk Thanks

More information

EXEMPLAR SELECTION TECHNIQUES FOR SPARSE REPRESENTATIONS OF SPEECH USING MULTIPLE DICTIONARIES. Emre Yılmaz, Jort F. Gemmeke, and Hugo Van hamme

EXEMPLAR SELECTION TECHNIQUES FOR SPARSE REPRESENTATIONS OF SPEECH USING MULTIPLE DICTIONARIES. Emre Yılmaz, Jort F. Gemmeke, and Hugo Van hamme EXEMPLAR SELECTION TECHNIQUES FOR SPARSE REPRESENTATIONS OF SPEECH USING MULTIPLE DICTIONARIES Emre Yılmaz, Jort F. Gemmeke, and Hugo Van hamme Dept. ESAT, KU Leuven, Leuven, Belgium ABSTRACT This paper

More information

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) Virendra Chauhan 1, Shobhana Dwivedi 2, Pooja Karale 3, Prof. S.M. Potdar 4 1,2,3B.E. Student 4 Assitant Professor 1,2,3,4Department of Electronics

More information

Speech Compression for Noise-Corrupted Thai Dialects

Speech Compression for Noise-Corrupted Thai Dialects American Journal of Applied Sciences 9 (3): 278-282, 2012 ISSN 1546-9239 2012 Science Publications Speech Compression for Noise-Corrupted Thai Dialects Suphattharachai Chomphan Department of Electrical

More information

Using the Soundtrack to Classify Videos

Using the Soundtrack to Classify Videos Using the Soundtrack to Classify Videos Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise 4 Special Issue Speech-Based Interfaces in Vehicles Research Report Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise Hiroyuki Hoshino Abstract This

More information

Comparison of Lip Image Feature Extraction Methods for Improvement of Isolated Word Recognition Rate

Comparison of Lip Image Feature Extraction Methods for Improvement of Isolated Word Recognition Rate , pp.57-61 http://dx.doi.org/10.14257/astl.2015.107.14 Comparison of Lip Image Feature Extraction Methods for Improvement of Isolated Word Recognition Rate Yong-Ki Kim 1, Jong Gwan Lim 2, Mi-Hye Kim *3

More information

Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss

Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Audio-Visual Speech Recognition Using Bimodal-Trained Features for a Person with Severe Hearing Loss Yuki Takashima 1, Ryo Aihara 1, Tetsuya Takiguchi

More information

Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04

Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04 Review of SPRACH/Thisl meetings Cambridge UK, 1998sep03/04 Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 4 5 SPRACH overview The SPRACH Broadcast

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 5 Introducing Tandem modeling

More information

IMPROVING INTELLIGIBILITY IN NOISE OF HMM-GENERATED SPEECH VIA NOISE-DEPENDENT AND -INDEPENDENT METHODS

IMPROVING INTELLIGIBILITY IN NOISE OF HMM-GENERATED SPEECH VIA NOISE-DEPENDENT AND -INDEPENDENT METHODS IMPROVING INTELLIGIBILITY IN NOISE OF HMM-GENERATED SPEECH VIA NOISE-DEPENDENT AND -INDEPENDENT METHODS Cassia Valentini-Botinhao 1, Elizabeth Godoy 2, Yannis Stylianou 2, Bastian Sauert 3 Simon King 1

More information

Effects of Noises on the Analysis of Fundamental Frequency Contours for Thai Speech

Effects of Noises on the Analysis of Fundamental Frequency Contours for Thai Speech Journal of Computer Science 7 (4): 568-572, 2011 ISSN 1549-3636 2011 Science Publications Effects of Noises on the Analysis of Fundamental Frequency Contours for Thai Speech Suphattharachai Chomphan Department

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the

More information

Robust Speech Detection for Noisy Environments

Robust Speech Detection for Noisy Environments Robust Speech Detection for Noisy Environments Óscar Varela, Rubén San-Segundo and Luis A. Hernández ABSTRACT This paper presents a robust voice activity detector (VAD) based on hidden Markov models (HMM)

More information

A bio-inspired feature extraction for robust speech recognition

A bio-inspired feature extraction for robust speech recognition Zouhir and Ouni SpringerPlus 24, 3:65 a SpringerOpen Journal RESEARCH A bio-inspired feature extraction for robust speech recognition Youssef Zouhir * and Kaïs Ouni Open Access Abstract In this paper,

More information

Research Article Automatic Speaker Recognition for Mobile Forensic Applications

Research Article Automatic Speaker Recognition for Mobile Forensic Applications Hindawi Mobile Information Systems Volume 07, Article ID 698639, 6 pages https://doi.org//07/698639 Research Article Automatic Speaker Recognition for Mobile Forensic Applications Mohammed Algabri, Hassan

More information

Deep Neural Networks for Voice Quality Assessment based on the GRBAS Scale

Deep Neural Networks for Voice Quality Assessment based on the GRBAS Scale INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Deep Neural Networks for Voice Quality Assessment based on the GRBAS Scale Simin Xie 1, 2, Nan Yan 2, Ping Yu 3, Manwa L. Ng 4, Lan Wang 2,Zhuanzhuan

More information

10CS664: PATTERN RECOGNITION QUESTION BANK

10CS664: PATTERN RECOGNITION QUESTION BANK 10CS664: PATTERN RECOGNITION QUESTION BANK Assignments would be handed out in class as well as posted on the class blog for the course. Please solve the problems in the exercises of the prescribed text

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Codebook driven short-term predictor parameter estimation for speech enhancement

Codebook driven short-term predictor parameter estimation for speech enhancement Codebook driven short-term predictor parameter estimation for speech enhancement Citation for published version (APA): Srinivasan, S., Samuelsson, J., & Kleijn, W. B. (2006). Codebook driven short-term

More information

Mitigating the Effects of Non-Stationary Unseen Noises on Language Recognition Performance

Mitigating the Effects of Non-Stationary Unseen Noises on Language Recognition Performance INTERSPEECH 2015 Mitigating the Effects of Non-Stationary Unseen Noises on Language Recognition Performance Luciana Ferrer 1, Mitchell McLaren 2, Aaron Lawson 2, Martin Graciarena 2 1 Departamento de Computación,

More information

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST 2010 1379 Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments Hynek Bořil,

More information

NOISE REDUCTION ALGORITHMS FOR BETTER SPEECH UNDERSTANDING IN COCHLEAR IMPLANTATION- A SURVEY

NOISE REDUCTION ALGORITHMS FOR BETTER SPEECH UNDERSTANDING IN COCHLEAR IMPLANTATION- A SURVEY INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 NOISE REDUCTION ALGORITHMS FOR BETTER SPEECH UNDERSTANDING IN COCHLEAR IMPLANTATION- A SURVEY Preethi N.M 1, P.F Khaleelur

More information

Noisy training for deep neural networks in speech recognition

Noisy training for deep neural networks in speech recognition Yin et al. EURASIP Journal on Audio, Speech, and Music Processing (15) 15:2 DOI.1186/s13636-14-47- RESEARCH Open Access Noisy training for deep neural networks in speech recognition Shi Yin 1,4,ChaoLiu

More information

Modeling the Use of Space for Pointing in American Sign Language Animation

Modeling the Use of Space for Pointing in American Sign Language Animation Modeling the Use of Space for Pointing in American Sign Language Animation Jigar Gohel, Sedeeq Al-khazraji, Matt Huenerfauth Rochester Institute of Technology, Golisano College of Computing and Information

More information

TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY

TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY ARCHIVES OF ACOUSTICS 32, 4 (Supplement), 187 192 (2007) TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY Piotr STARONIEWICZ Wrocław University of Technology Institute of Telecommunications,

More information

Speech, Hearing and Language: work in progress. Volume 11

Speech, Hearing and Language: work in progress. Volume 11 Speech, Hearing and Language: work in progress Volume 11 Other Publications in Speech and Hearing Science: 1998 to 1999. Department of Phonetics and Linguistics UNIVERSITY COLLEGE LONDON 201 Other Publications

More information

Sound Texture Classification Using Statistics from an Auditory Model

Sound Texture Classification Using Statistics from an Auditory Model Sound Texture Classification Using Statistics from an Auditory Model Gabriele Carotti-Sha Evan Penn Daniel Villamizar Electrical Engineering Email: gcarotti@stanford.edu Mangement Science & Engineering

More information

SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf

SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf 2th European Signal Processing Conference (EUSIPCO 212) Bucharest, Romania, August 27-31, 212 SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS Christina Leitner and Franz Pernkopf

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Literature Survey Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is

More information

Exploiting visual information for NAM recognition

Exploiting visual information for NAM recognition Exploiting visual information for NAM recognition Panikos Heracleous, Denis Beautemps, Viet-Anh Tran, Hélène Loevenbruck, Gérard Bailly To cite this version: Panikos Heracleous, Denis Beautemps, Viet-Anh

More information

doi: / _59(

doi: / _59( doi: 10.1007/978-3-642-39188-0_59(http://dx.doi.org/10.1007/978-3-642-39188-0_59) Subunit modeling for Japanese sign language recognition based on phonetically depend multi-stream hidden Markov models

More information

Speaker Independent Isolated Word Speech to Text Conversion Using Auto Spectral Subtraction for Punjabi Language

Speaker Independent Isolated Word Speech to Text Conversion Using Auto Spectral Subtraction for Punjabi Language International Journal of Scientific and Research Publications, Volume 7, Issue 7, July 2017 469 Speaker Independent Isolated Word Speech to Text Conversion Using Auto Spectral Subtraction for Punjabi Language

More information

Speech to Text Wireless Converter

Speech to Text Wireless Converter Speech to Text Wireless Converter Kailas Puri 1, Vivek Ajage 2, Satyam Mali 3, Akhil Wasnik 4, Amey Naik 5 And Guided by Dr. Prof. M. S. Panse 6 1,2,3,4,5,6 Department of Electrical Engineering, Veermata

More information

Lecture 9: Speech Recognition: Front Ends

Lecture 9: Speech Recognition: Front Ends EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition: Front Ends 1 2 Recognizing Speech Feature Calculation Dan Ellis http://www.ee.columbia.edu/~dpwe/e682/

More information

MULTI-MODAL FETAL ECG EXTRACTION USING MULTI-KERNEL GAUSSIAN PROCESSES. Bharathi Surisetti and Richard M. Dansereau

MULTI-MODAL FETAL ECG EXTRACTION USING MULTI-KERNEL GAUSSIAN PROCESSES. Bharathi Surisetti and Richard M. Dansereau MULTI-MODAL FETAL ECG EXTRACTION USING MULTI-KERNEL GAUSSIAN PROCESSES Bharathi Surisetti and Richard M. Dansereau Carleton University, Department of Systems and Computer Engineering 25 Colonel By Dr.,

More information

Research Proposal on Emotion Recognition

Research Proposal on Emotion Recognition Research Proposal on Emotion Recognition Colin Grubb June 3, 2012 Abstract In this paper I will introduce my thesis question: To what extent can emotion recognition be improved by combining audio and visual

More information

Advanced Methods and Tools for ECG Data Analysis

Advanced Methods and Tools for ECG Data Analysis Advanced Methods and Tools for ECG Data Analysis Gari D. Clifford Francisco Azuaje Patrick E. McSharry Editors ARTECH HOUSE BOSTON LONDON artechhouse.com Preface XI The Physiological Basis of the Electrocardiogram

More information

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Reza Lotfian and Carlos Busso Multimodal Signal Processing (MSP) lab The University of Texas

More information

VITHEA: On-line word naming therapy in Portuguese for aphasic patients exploiting automatic speech recognition

VITHEA: On-line word naming therapy in Portuguese for aphasic patients exploiting automatic speech recognition VITHEA: On-line word naming therapy in Portuguese for aphasic patients exploiting automatic speech recognition Anna Pompili, Pedro Fialho and Alberto Abad L 2 F - Spoken Language Systems Lab, INESC-ID

More information

A New Paradigm for the Evaluation of Forensic Evidence. Geoffrey Stewart Morrison. p p(e H )

A New Paradigm for the Evaluation of Forensic Evidence. Geoffrey Stewart Morrison. p p(e H ) A New Paradigm for the Evaluation of Forensic Evidence Geoffrey Stewart Morrison p(e H ) p p(e H ) d Abstract In Europe there has been a great deal of concern about the logically correct way to evaluate

More information

Heart Murmur Recognition Based on Hidden Markov Model

Heart Murmur Recognition Based on Hidden Markov Model Journal of Signal and Information Processing, 2013, 4, 140-144 http://dx.doi.org/10.4236/jsip.2013.42020 Published Online May 2013 (http://www.scirp.org/journal/jsip) Heart Murmur Recognition Based on

More information

MULTIMODAL FUSION FOR CUED SPEECH LANGUAGE RECOGNITION

MULTIMODAL FUSION FOR CUED SPEECH LANGUAGE RECOGNITION MULTIMODAL FUSION FOR CUED SPEECH LANGUAGE RECOGNITION Savvas Argyropoulos 1,2, Dimitrios Tzovaras 2, and Michael G. Strintzis 1,2 1 Electrical and Computer Engineering Dept. Aristotle University of Thessaloniki

More information

Outline. Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes. Motivation. Why Hidden Markov Model? Why Hidden Markov Model?

Outline. Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes. Motivation. Why Hidden Markov Model? Why Hidden Markov Model? Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes Li-Jung Liang Department of Medicine Statistics Core Email: liangl@ucla.edu Joint work with Rob Weiss & Scott Comulada Outline Motivation

More information

Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering

Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering Bio-Medical Materials and Engineering 26 (2015) S1059 S1065 DOI 10.3233/BME-151402 IOS Press S1059 Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering Yong Xia

More information

Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain

Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain F. Archetti 1,2, G. Arosio 1, E. Fersini 1, E. Messina 1 1 DISCO, Università degli Studi di Milano-Bicocca, Viale Sarca,

More information

Robust system for patient specific classification of ECG signal using PCA and Neural Network

Robust system for patient specific classification of ECG signal using PCA and Neural Network International Research Journal of Engineering and Technology (IRJET) e-issn: 395-56 Volume: 4 Issue: 9 Sep -7 www.irjet.net p-issn: 395-7 Robust system for patient specific classification of using PCA

More information

Reach and grasp by people with tetraplegia using a neurally controlled robotic arm

Reach and grasp by people with tetraplegia using a neurally controlled robotic arm Leigh R. Hochberg et al. Reach and grasp by people with tetraplegia using a neurally controlled robotic arm Nature, 17 May 2012 Paper overview Ilya Kuzovkin 11 April 2014, Tartu etc How it works?

More information

Analysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System

Analysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System Analysis of Recognition Techniques for use in a Sound Recognition System Michael Cowling, Member, IEEE and Renate Sitte, Member, IEEE Griffith University Faculty of Engineering & Information Technology

More information

arxiv: v1 [cs.cv] 13 Mar 2018

arxiv: v1 [cs.cv] 13 Mar 2018 RESOURCE AWARE DESIGN OF A DEEP CONVOLUTIONAL-RECURRENT NEURAL NETWORK FOR SPEECH RECOGNITION THROUGH AUDIO-VISUAL SENSOR FUSION Matthijs Van keirsbilck Bert Moons Marian Verhelst MICAS, Department of

More information

Computational Perception /785. Auditory Scene Analysis

Computational Perception /785. Auditory Scene Analysis Computational Perception 15-485/785 Auditory Scene Analysis A framework for auditory scene analysis Auditory scene analysis involves low and high level cues Low level acoustic cues are often result in

More information

Recognition & Organization of Speech and Audio

Recognition & Organization of Speech and Audio Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University Outline 1 2 3 4 5 Sound organization Background & related work Existing projects

More information

A New Paradigm for the Evaluation of Forensic Evidence

A New Paradigm for the Evaluation of Forensic Evidence A New Paradigm for the Evaluation of Forensic Evidence and its implementation in forensic voice comparison Geoffrey Stewart Morrison Ewald Enzinger p(e H ) p p(e H ) d Abstract & Biographies In Europe

More information

Some Studies on of Raaga Emotions of Singers Using Gaussian Mixture Model

Some Studies on of Raaga Emotions of Singers Using Gaussian Mixture Model International Journal for Modern Trends in Science and Technology Volume: 03, Special Issue No: 01, February 2017 ISSN: 2455-3778 http://www.ijmtst.com Some Studies on of Raaga Emotions of Singers Using

More information

Speech Processing / Speech Translation Case study: Transtac Details

Speech Processing / Speech Translation Case study: Transtac Details Speech Processing 11-492/18-492 Speech Translation Case study: Transtac Details Phraselator: One Way Translation Commercial System VoxTec Rapid deployment Modules of 500ish utts Transtac: Two S2S System

More information

A Review on Dysarthria speech disorder

A Review on Dysarthria speech disorder A Review on Dysarthria speech disorder Miss. Yogita S. Mahadik, Prof. S. U. Deoghare, 1 Student, Dept. of ENTC, Pimpri Chinchwad College of Engg Pune, Maharashtra, India 2 Professor, Dept. of ENTC, Pimpri

More information

SPEECH EMOTION RECOGNITION: ARE WE THERE YET?

SPEECH EMOTION RECOGNITION: ARE WE THERE YET? SPEECH EMOTION RECOGNITION: ARE WE THERE YET? CARLOS BUSSO Multimodal Signal Processing (MSP) lab The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Why study emotion

More information

EMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING

EMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING EMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING Nazia Hossain 1, Rifat Jahan 2, and Tanjila Tabasum Tunka 3 1 Senior Lecturer, Department of Computer Science

More information

Development of novel algorithm by combining Wavelet based Enhanced Canny edge Detection and Adaptive Filtering Method for Human Emotion Recognition

Development of novel algorithm by combining Wavelet based Enhanced Canny edge Detection and Adaptive Filtering Method for Human Emotion Recognition International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 12, Issue 9 (September 2016), PP.67-72 Development of novel algorithm by combining

More information

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar,

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar, International Journal of Pure and Applied Mathematics Volume 117 No. 8 2017, 121-125 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: 10.12732/ijpam.v117i8.25

More information

COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA

COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION M. Grimm 1, E. Mower 2, K. Kroschel 1, and S. Narayanan 2 1 Institut für Nachrichtentechnik (INT), Universität Karlsruhe (TH), Karlsruhe,

More information

Sign Language MT. Sara Morrissey

Sign Language MT. Sara Morrissey Sign Language MT Sara Morrissey Introduction Overview Irish Sign Language Problems for SLMT SLMT Data MaTrEx for SLs Future work Introduction (1) Motivation SLs are poorly resourced and lack political,

More information

REZUMAT TEZA DE DOCTORAT

REZUMAT TEZA DE DOCTORAT Investeşte în oameni! FONDUL SOCIAL EUROPEAN Programul Operaţional Sectorial Dezvoltarea Resurselor Umane 2007 2013 Axa prioritară: 1 Educaţia şi formarea profesională în sprijinul creşterii economice

More information

EDGE DETECTION OF THE SCOLIOTIC VERTEBRAE USING X-RAY IMAGES

EDGE DETECTION OF THE SCOLIOTIC VERTEBRAE USING X-RAY IMAGES Journal of Engineering Science and Technology 4 th EURECA 2015 Special Issue February (2016) 166-175 School of Engineering, Taylor s University EDGE DETECTION OF THE SCOLIOTIC VERTEBRAE USING X-RAY IMAGES

More information

Type II Fuzzy Possibilistic C-Mean Clustering

Type II Fuzzy Possibilistic C-Mean Clustering IFSA-EUSFLAT Type II Fuzzy Possibilistic C-Mean Clustering M.H. Fazel Zarandi, M. Zarinbal, I.B. Turksen, Department of Industrial Engineering, Amirkabir University of Technology, P.O. Box -, Tehran, Iran

More information

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian The 29th Fuzzy System Symposium (Osaka, September 9-, 3) A Fuzzy Inference Method Based on Saliency Map for Prediction Mao Wang, Yoichiro Maeda 2, Yasutake Takahashi Graduate School of Engineering, University

More information

Medical Knowledge Attention Enhanced Neural Model. for Named Entity Recognition in Chinese EMR

Medical Knowledge Attention Enhanced Neural Model. for Named Entity Recognition in Chinese EMR Medical Knowledge Attention Enhanced Neural Model for Named Entity Recognition in Chinese EMR Zhichang Zhang, Yu Zhang, Tong Zhou College of Computer Science and Engineering, Northwest Normal University,

More information