NTNU Speech Recognition Course

Speech Recognition

Fall 2010
2:10 ~5:00 pm, Mondays
Instructor: Dr. Berlin Chen (陳柏琳)

Topic List and Schedule:

09/13 Course Overview & Introduction 　

09/20 Hidden Markov Models for Speech Recognition

09/27 Break (Interspeech 2010)

10/04 　 Hidden Markov Models for Speech Recognition 　

10/11 　 Spoken Language Structure 　

10/18 　 Review of Probability Axioms and Laws
Maximum Likelihood Estimation 　

10/25 　 Acoustic Modeling 　

11/01 　 Acoustic Modeling 　

11/08 　 Language Modeling 　

11/15 　 Language Modeling 　

11/22 　 Midterm 　

11/29 　 ISCSLP2010 　

12/06 　 Search Algorithms 　

12/13 　 Speech Signal Analysis 　

12/20 　 Speech Signal Analysis 　

12/27 　 Robustness 　

01/03 　 Paper Presentations (I)
林士翔: A Study of Irrelevant Variability Normalization Based Training and Unsupervised Online Adaptation for LVCSR (Interspeech 2010)
賴敏軒: Reranking with Multiple Features for Better Transliteration (ACL 2010)
朱紋儀: Evaluation of Modulation Spectrum Equalization Techniques for Large Vocabulary Robust Speech Recognition (Interspeech 2008)
陳珮寧: Novel Weighting Scheme for Unsupervised Language Model Adaptation Using Latent Dirichlet Allocation (Interspeech 2010) 　

01/10 　 Paper Presentations (II) 　

Reference Books:

§ 　 X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001 　

§ 　 Jacob Benesty (ed.), M. Mohan Sondhi (ed.), Yiteng Huang (ed.), Springer Handbook of Speech Processing, Springer, 2007 　

§ 　 L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993 　

§ 　 M.J.F. Gales and S.J. Young. The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing, 2008 　

§ 　 L. Rabiner and R.W. Schafer. Introduction to Digital Speech Processing. Foundations and Trends in Signal Processing, 2007 　

§ 　 W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003 　

§ 　 S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk" 　

§ 　 T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002 　

§ 　 F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999 　

§ 　 J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000 　

§ 　 C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999 　

§ 　 J. Bellegarda, Latent Semantic Mapping: Principles & Applications (Synthesis Lectures on Speech and Audio Processing), 2008 　

§ 　 T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007 　

§ 　 Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004 　

§ 　 D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability,” Athena Scientific, 2002 　

Reference Papers:

§ 　 L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003. 　

§ 　 Baker, J.M.et al., Research Developments and directions in speech recognition and understanding, part 1, IEEE Signal Processing Magazine 25(3), May 2009. 　

§ 　 Baker, J.M.et al., Research Developments and directions in speech recognition and understanding, part 2, IEEE Signal Processing Magazine 25(4), July 2009. 　

§ 　 M. Ostendorf, Speech Technology and Information Access, IEEE Signal Processing Magazine 25(3), May 2008. 　

§ 　 L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989 　

§ 　 A. V. Oppenheim and R. W. Schafer, "From Frequency to Quefrency: A History of the Cepstrum," IEEE Signal Processing Magazine 21(5), September 2004. 　

§ 　 A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, 1977 　

§ 　 J. A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021 　

§ 　 J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247 　

§ 　 R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from Here?,” Proceedings of IEEE, August, 2000 　

§ 　 H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000 　

§ 　 Aubert, X. L., "An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition," Computer Speech and Language, vol. 16, 2002, pp. 89-114. 　

§ 　 H. Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), Speech Communication, 25(1-3), 1998. 　

§ 　 J. R. Bellegarda, "Statistical Language Model Adaptation: Review and Perspectives," Speech Communication, vol. 42, no.1, pp. 93-108, 2004. 　

§ 　 B. Roark, "A survey of discriminative language modeling approaches for large vocabulary continuous speech recognition," in Large Margin and Kernel Approaches to Speech and Speaker Recognition, J. Keshet and S. Bengio (Eds.), Wiley, 2009. 　

§ 　 L. Rabiner, B.H. Juang, "Speech Recognition: Statistical Methods," Encyclopedia of Language & Linguistics, pp. 1-18, 2006. 　

§ 　 P. Nguyen, "TechWare: Speech recognition software and resources on the web," IEEE Signal Processing Magazine 25(3), May 2009. 　

§ 　 J. B. Allen, F. Li, "Speech Perception and Cochlear Signal Processing," IEEE Signal Processing Magazine 25(4), July 2009. 　

§ 　 A. Orlitsky, N. P. Santhanam, J. Zhang, "Always Good Turing: Asymptotically Optimal Probability Estimation," Science, 17 October 2003. 　

○ 　 Proceedings of IEEE 88(8), August, 2000 (Special Issue on Spoken Language Processing) 　

○ 　 IEEE Signal Processing Magazine 22(5), September 2005 (Special Issue on Speech Technology and Systems in Human-Machine Communication) 　

○ 　 IEEE Signal Processing Magazine 25(3), May 2008 (Special Issue on Spoken Language Technology) 　

§ 　 Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494. 　

Reference Presentations:

§ 　 J. Droppo, Noise Robust Automatic Speech Recognition, a comprehensive tutorial talk given at EUSIPCO 2008 　

§ 　 B. Chen, Latent Semantic Approaches for Information Retrieval and Language Modeling, a talk given at Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., 2008 　

§ 　 B. Chen, Recent Developments in Chinese Spoken Document Search and Distillation, a talk given at Google Taipei, 2009 　

　

TA:
    林士翔同學 (博三)
     – E-mail: shlin@csie.ntnu.edu.tw
     – Tel: 29322411ext 208 (資工系208室)

09/13		Course Overview & Introduction
09/20		Hidden Markov Models for Speech Recognition
09/27		Break (Interspeech 2010)
10/04		Hidden Markov Models for Speech Recognition
10/11		Spoken Language Structure
10/18		Review of Probability Axioms and Laws Maximum Likelihood Estimation
10/25		Acoustic Modeling
11/01		Acoustic Modeling
11/08		Language Modeling
11/15		Language Modeling
11/22		Midterm
11/29		ISCSLP2010
12/06		Search Algorithms
12/13		Speech Signal Analysis
12/20		Speech Signal Analysis
12/27		Robustness
01/03		Paper Presentations (I) 林士翔: A Study of Irrelevant Variability Normalization Based Training and Unsupervised Online Adaptation for LVCSR (Interspeech 2010) 賴敏軒: Reranking with Multiple Features for Better Transliteration (ACL 2010) 朱紋儀: Evaluation of Modulation Spectrum Equalization Techniques for Large Vocabulary Robust Speech Recognition (Interspeech 2008) 陳珮寧: Novel Weighting Scheme for Unsupervised Language Model Adaptation Using Latent Dirichlet Allocation (Interspeech 2010)
01/10		Paper Presentations (II)

§		X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001
§		Jacob Benesty (ed.), M. Mohan Sondhi (ed.), Yiteng Huang (ed.), Springer Handbook of Speech Processing, Springer, 2007
§		L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993
§		M.J.F. Gales and S.J. Young. The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing, 2008
§		L. Rabiner and R.W. Schafer. Introduction to Digital Speech Processing. Foundations and Trends in Signal Processing, 2007
§		W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003
§		S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk"
§		T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002
§		F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999
§		J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000
§		C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999
§		J. Bellegarda, Latent Semantic Mapping: Principles & Applications (Synthesis Lectures on Speech and Audio Processing), 2008
§		T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007
§		Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004
§		D. P. Bertsekas, J. N. Tsitsiklis, “Introduction to Probability,” Athena Scientific, 2002

§		L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003.
§		Baker, J.M.et al., Research Developments and directions in speech recognition and understanding, part 1, IEEE Signal Processing Magazine 25(3), May 2009.
§		Baker, J.M.et al., Research Developments and directions in speech recognition and understanding, part 2, IEEE Signal Processing Magazine 25(4), July 2009.
§		M. Ostendorf, Speech Technology and Information Access, IEEE Signal Processing Magazine 25(3), May 2008.
§		L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989
§		A. V. Oppenheim and R. W. Schafer, "From Frequency to Quefrency: A History of the Cepstrum," IEEE Signal Processing Magazine 21(5), September 2004.
§		A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, 1977
§		J. A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021
§		J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247
§		R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from Here?,” Proceedings of IEEE, August, 2000
§		H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000
§		Aubert, X. L., "An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition," Computer Speech and Language, vol. 16, 2002, pp. 89-114.
§		H. Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), Speech Communication, 25(1-3), 1998.
§		J. R. Bellegarda, "Statistical Language Model Adaptation: Review and Perspectives," Speech Communication, vol. 42, no.1, pp. 93-108, 2004.
§		B. Roark, "A survey of discriminative language modeling approaches for large vocabulary continuous speech recognition," in Large Margin and Kernel Approaches to Speech and Speaker Recognition, J. Keshet and S. Bengio (Eds.), Wiley, 2009.
§		L. Rabiner, B.H. Juang, "Speech Recognition: Statistical Methods," Encyclopedia of Language & Linguistics, pp. 1-18, 2006.
§		P. Nguyen, "TechWare: Speech recognition software and resources on the web," IEEE Signal Processing Magazine 25(3), May 2009.
§		J. B. Allen, F. Li, "Speech Perception and Cochlear Signal Processing," IEEE Signal Processing Magazine 25(4), July 2009.
§		A. Orlitsky, N. P. Santhanam, J. Zhang, "Always Good Turing: Asymptotically Optimal Probability Estimation," Science, 17 October 2003.
○		Proceedings of IEEE 88(8), August, 2000 (Special Issue on Spoken Language Processing)
○		IEEE Signal Processing Magazine 22(5), September 2005 (Special Issue on Speech Technology and Systems in Human-Machine Communication)
○		IEEE Signal Processing Magazine 25(3), May 2008 (Special Issue on Spoken Language Technology)
§		Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494.

§		J. Droppo, Noise Robust Automatic Speech Recognition, a comprehensive tutorial talk given at EUSIPCO 2008
§		B. Chen, Latent Semantic Approaches for Information Retrieval and Language Modeling, a talk given at Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., 2008
§		B. Chen, Recent Developments in Chinese Spoken Document Search and Distillation, a talk given at Google Taipei, 2009