NTNU Speech Processing Course

Speech Processing

Fall 2015
9:10 ~12:00 am, Fridays
Instructor: Dr. Berlin Chen (陳柏琳)

Topic List and Schedule:

03/06 Course Overview & Introduction Readings: 1. F. Jelinek, The Speech Recognition Problem, Chapter 1 of the book "Statistical Methods for Speech Recognition."
               2. L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003.
                 3. S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013.
                  4. Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494.

03/20 　 Hidden Markov Models for Speech Recognition Readings: L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,”
                                   Proceedings of the IEEE, vol. 77, No. 2, February 1989
　

03/27 　 Hidden Markov Models for Speech Recognition HW#1: Hidden Markov Models (Forward/Backward and Viterbi Algorithms)

04/10 　 Spoken Language Structure HW#2: Hidden Markov Models (Model Estimation)

04/17 　 Maximum Likelihood Estimation 　

04/24 　 Acoustic Modeling 　

05/01 　 Language Modeling 　

05/08 　 Language Modeling HW#3: Isolated Word Recognition (Explanation)

05/15 　 Search Algorithm 　

05/22 　 Speech Signal Analysis Reference: Digital Signal Processing

05/29 　 Robustness 　

06/05 　 Paper Presentation
沈信佑- 以語音辨識與評分輔助口說英文學習
劉長諺、林奕儒- IEEE INFOCOM 2008: Tracking Down Skype Traffic
莊惟翔- Educational Technology & Society, 2006: Learning portfolio analysis and mining for SCORM compliant environment
張益豪- AAAI 2001: Information Extraction with HMM Structures Learned by Stochastic Optimization
杜曉玟、劉三賢- ICASSP 2014: Medium-duration modulation cepstral feature for robust speech recognition
許曜麒- Interspeech 2010: Discriminatively Trained Acoustic Model for Improving Mispronunciation Detection and Diagnosis in Computer Aided Pronunciation Training (CAPT)
楊明翰- Interspeech 2014: Improving ASR Performance On Non-native Speech Using Multilingual and Crosslingual Information
王文傑- Speech Communication 2015: Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model

06/12 　劉暐辰- ICPPW 2014: GPU-accelerated HMM for Speech Recognition
連俊豪- ROCLING 2006: 基於特製隱藏式馬可夫模型之中文斷詞研究
林雅婷、曾苑蓉- ACM ICMR 2013: Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos
陳筑林、陳佑欣 ICASSP 2012: Audio event detection from acoustic unit occurrence patterns
秦翔: ROCLING 2007: 多國語言語音命令系統

　　 Introduction to Hidden Markov Toolkit (HTK)
　 Exercise

　　 More on Language Modeling for Speech-Related Applications 　

　　 Brief Introduction to Text-to-Speech Synthesis 　

　　 Recent Developments in Text and Speech Summarization 　

Reference Books:

§ 　 L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing, Pearson, 2011 　

§ 　 X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001 　

§ 　 Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (ed.), Springer Handbook of Speech Processing, Springer, 2007 　

§ 　 Tuomas Virtanen, Rita Singh, Bhiksha Raj (ed.), Techniques for Noise Robustness in Automatic Speech Recognition, John Wiley & Sons, 2013 　

§ 　 L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993 　

§ 　 M.J.F. Gales and S.J. Young. The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing, 2008 　

§ 　 L. Rabiner and R.W. Schafer. Introduction to Digital Speech Processing. Foundations and Trends in Signal Processing, 2007 　

§ 　 W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003 　

§ 　 S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk" 　

§ 　 T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002 　

§ 　 F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999 　

§ 　 Dong Yu and Li Deng, "Automatic Speech Recognition: A Deep Learning Approach," Springer, 2015 　

§ 　 J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000 　

§ 　 C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999 　

§ 　 J. Bellegarda, Latent Semantic Mapping: Principles & Applications (Synthesis Lectures on Speech and Audio Processing), 2008 　

§ 　 T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007 　

§ 　 Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004 　

§ 　 D. P. Bertsekas, J. N. Tsitsiklis, Introduction to Probability, Athena Scientific, 2002 　

§ 　 G. McLachlan, T. Krishnan, The EM Algorithm and Extensons, 2nd Edition, Wiley, 2008 　

Reference Papers:

§ 　 L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003. 　

§ 　 S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013. 　

§ 　 Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, 521, pp. 436-444, 2015 　

§ 　 J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 1, IEEE Signal Processing Magazine 25(3), May 2009. 　

§ 　 J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 2, IEEE Signal Processing Magazine 25(4), July 2009. 　

§ 　 J. Schalkwyk et al., "Google Search by Voice: A case study," 2010. 　

§ 　 M. Ostendorf, Speech Technology and Information Access, IEEE Signal Processing Magazine 25(3), May 2008. 　

§ 　 L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989 　

§ 　 A. V. Oppenheim and R. W. Schafer, "From Frequency to Quefrency: A History of the Cepstrum," IEEE Signal Processing Magazine 21(5), September 2004. 　

§ 　 A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, 1977 　

§ 　 J. A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021 　

§ 　 J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247 　

§ 　 R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from Here?,” Proceedings of IEEE, August, 2000 　

§ 　 H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000 　

§ 　 Aubert, X. L., "An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition," Computer Speech and Language, vol. 16, 2002, pp. 89-114. 　

§ 　 Hynek Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), 1998. 　

§ 　 Hynek Hermansky, "Speech recognition from spectral dynamics", Sadhana, 36(5), 2011. 　

§ 　 J. R. Bellegarda, "Statistical Language Model Adaptation: Review and Perspectives," Speech Communication, vol. 42, no.1, pp. 93-108, 2004. 　

§ 　 B. Roark, "A survey of discriminative language modeling approaches for large vocabulary continuous speech recognition," in Large Margin and Kernel Approaches to Speech and Speaker Recognition, J. Keshet and S. Bengio (Eds.), Wiley, 2009. 　

§ 　 L. Rabiner, B.H. Juang, "Speech Recognition: Statistical Methods," Encyclopedia of Language & Linguistics, pp. 1-18, 2006. 　

§ 　 P. Nguyen, "TechWare: Speech recognition software and resources on the web," IEEE Signal Processing Magazine 25(3), May 2009. 　

§ 　 J. B. Allen, F. Li, "Speech Perception and Cochlear Signal Processing," IEEE Signal Processing Magazine 25(4), July 2009. 　

§ 　 A. Orlitsky, N. P. Santhanam, J. Zhang, "Always Good Turing: Asymptotically Optimal Probability Estimation," Science, 17 October 2003. 　

○ 　 Proceedings of IEEE 88(8), August, 2000 (Special Issue on Spoken Language Processing) 　

§ 　 Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494. 　

§ 　 L. Deng and X. Li, "Machine learning paradigms for speech recognition: An overview," IEEE Transactions on Audio, Speech, and Language Processing, 21(5), pp. 1060 - 1089, May, 2013. 　

§ 　 H. Li, B. Ma and K. A. Lee, "Spoken Language Recognition: From Fundamentals to Practice," Proceedings of the IEEE, February 2013. 　

○ 　 IEEE Signal Processing Magazine 22(5), September 2005 (Special Issue on Speech Technology and Systems in Human-Machine Communication) 　

○ 　 IEEE Signal Processing Magazine 25(3), May 2008 (Special Issue on Spoken Language Technology) 　

○ 　 IEEE Signal Processing Magazine 29(6), December 2012 (Special Issue on Fundamental Technologies in Modern Speech Recognition) 　

○ 　 Proceedings of IEEE 101(5), May 2013 (Special Issue on Speech Information Processing: Theory and Applications) 　

Reference Presentations/Web Pages:

§ 　 J. Droppo, Noise Robust Automatic Speech Recognition, a comprehensive tutorial talk given at EUSIPCO 2008 　

§ 　 B. Chen, Latent Semantic Approaches for Information Retrieval and Language Modeling, a talk given at Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., 2008 　

§ 　 B. Chen, Recent Developments in Chinese Spoken Document Search and Distillation, a talk given at Google Taipei, 2009 　

§ 　 S. Chen, D. Beeferman, R. Rosenfeld, Evaluation metrics for language models, NIST 　

03/06	Course Overview & Introduction	Readings: 1. F. Jelinek, The Speech Recognition Problem, Chapter 1 of the book "Statistical Methods for Speech Recognition." 2. L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003. 3. S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013. 4. Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494.
03/20	Hidden Markov Models for Speech Recognition	Readings: L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989
03/27	Hidden Markov Models for Speech Recognition	HW#1: Hidden Markov Models (Forward/Backward and Viterbi Algorithms)
04/10	Spoken Language Structure	HW#2: Hidden Markov Models (Model Estimation)
04/17	Maximum Likelihood Estimation
04/24	Acoustic Modeling
05/01	Language Modeling
05/08	Language Modeling	HW#3: Isolated Word Recognition (Explanation)
05/15	Search Algorithm
05/22	Speech Signal Analysis	Reference: Digital Signal Processing
05/29	Robustness
06/05	Paper Presentation 沈信佑- 以語音辨識與評分輔助口說英文學習劉長諺、林奕儒- IEEE INFOCOM 2008: Tracking Down Skype Traffic 莊惟翔- Educational Technology & Society, 2006: Learning portfolio analysis and mining for SCORM compliant environment 張益豪- AAAI 2001: Information Extraction with HMM Structures Learned by Stochastic Optimization 杜曉玟、劉三賢- ICASSP 2014: Medium-duration modulation cepstral feature for robust speech recognition 許曜麒- Interspeech 2010: Discriminatively Trained Acoustic Model for Improving Mispronunciation Detection and Diagnosis in Computer Aided Pronunciation Training (CAPT) 楊明翰- Interspeech 2014: Improving ASR Performance On Non-native Speech Using Multilingual and Crosslingual Information 王文傑- Speech Communication 2015: Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model
06/12	劉暐辰- ICPPW 2014: GPU-accelerated HMM for Speech Recognition 連俊豪- ROCLING 2006: 基於特製隱藏式馬可夫模型之中文斷詞研究林雅婷、曾苑蓉- ACM ICMR 2013: Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos 陳筑林、陳佑欣 ICASSP 2012: Audio event detection from acoustic unit occurrence patterns 秦翔: ROCLING 2007: 多國語言語音命令系統
	Introduction to Hidden Markov Toolkit (HTK)	Exercise
	More on Language Modeling for Speech-Related Applications
	Brief Introduction to Text-to-Speech Synthesis
	Recent Developments in Text and Speech Summarization

§		L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing, Pearson, 2011
§		X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001
§		Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (ed.), Springer Handbook of Speech Processing, Springer, 2007
§		Tuomas Virtanen, Rita Singh, Bhiksha Raj (ed.), Techniques for Noise Robustness in Automatic Speech Recognition, John Wiley & Sons, 2013
§		L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993
§		M.J.F. Gales and S.J. Young. The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing, 2008
§		L. Rabiner and R.W. Schafer. Introduction to Digital Speech Processing. Foundations and Trends in Signal Processing, 2007
§		W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003
§		S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk"
§		T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002
§		F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999
§		Dong Yu and Li Deng, "Automatic Speech Recognition: A Deep Learning Approach," Springer, 2015
§		J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000
§		C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999
§		J. Bellegarda, Latent Semantic Mapping: Principles & Applications (Synthesis Lectures on Speech and Audio Processing), 2008
§		T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007
§		Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004
§		D. P. Bertsekas, J. N. Tsitsiklis, Introduction to Probability, Athena Scientific, 2002
§		G. McLachlan, T. Krishnan, The EM Algorithm and Extensons, 2nd Edition, Wiley, 2008

§		L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003.
§		S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013.
§		Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, 521, pp. 436-444, 2015
§		J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 1, IEEE Signal Processing Magazine 25(3), May 2009.
§		J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 2, IEEE Signal Processing Magazine 25(4), July 2009.
§		J. Schalkwyk et al., "Google Search by Voice: A case study," 2010.
§		M. Ostendorf, Speech Technology and Information Access, IEEE Signal Processing Magazine 25(3), May 2008.
§		L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989
§		A. V. Oppenheim and R. W. Schafer, "From Frequency to Quefrency: A History of the Cepstrum," IEEE Signal Processing Magazine 21(5), September 2004.
§		A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, 1977
§		J. A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021
§		J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247
§		R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from Here?,” Proceedings of IEEE, August, 2000
§		H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000
§		Aubert, X. L., "An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition," Computer Speech and Language, vol. 16, 2002, pp. 89-114.
§		Hynek Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), 1998.
§		Hynek Hermansky, "Speech recognition from spectral dynamics", Sadhana, 36(5), 2011.
§		J. R. Bellegarda, "Statistical Language Model Adaptation: Review and Perspectives," Speech Communication, vol. 42, no.1, pp. 93-108, 2004.
§		B. Roark, "A survey of discriminative language modeling approaches for large vocabulary continuous speech recognition," in Large Margin and Kernel Approaches to Speech and Speaker Recognition, J. Keshet and S. Bengio (Eds.), Wiley, 2009.
§		L. Rabiner, B.H. Juang, "Speech Recognition: Statistical Methods," Encyclopedia of Language & Linguistics, pp. 1-18, 2006.
§		P. Nguyen, "TechWare: Speech recognition software and resources on the web," IEEE Signal Processing Magazine 25(3), May 2009.
§		J. B. Allen, F. Li, "Speech Perception and Cochlear Signal Processing," IEEE Signal Processing Magazine 25(4), July 2009.
§		A. Orlitsky, N. P. Santhanam, J. Zhang, "Always Good Turing: Asymptotically Optimal Probability Estimation," Science, 17 October 2003.
○		Proceedings of IEEE 88(8), August, 2000 (Special Issue on Spoken Language Processing)
§		Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494.
§		L. Deng and X. Li, "Machine learning paradigms for speech recognition: An overview," IEEE Transactions on Audio, Speech, and Language Processing, 21(5), pp. 1060 - 1089, May, 2013.
§		H. Li, B. Ma and K. A. Lee, "Spoken Language Recognition: From Fundamentals to Practice," Proceedings of the IEEE, February 2013.
○		IEEE Signal Processing Magazine 22(5), September 2005 (Special Issue on Speech Technology and Systems in Human-Machine Communication)
○		IEEE Signal Processing Magazine 25(3), May 2008 (Special Issue on Spoken Language Technology)
○		IEEE Signal Processing Magazine 29(6), December 2012 (Special Issue on Fundamental Technologies in Modern Speech Recognition)
○		Proceedings of IEEE 101(5), May 2013 (Special Issue on Speech Information Processing: Theory and Applications)

§		J. Droppo, Noise Robust Automatic Speech Recognition, a comprehensive tutorial talk given at EUSIPCO 2008
§		B. Chen, Latent Semantic Approaches for Information Retrieval and Language Modeling, a talk given at Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., 2008
§		B. Chen, Recent Developments in Chinese Spoken Document Search and Distillation, a talk given at Google Taipei, 2009
§		S. Chen, D. Beeferman, R. Rosenfeld, Evaluation metrics for language models, NIST