NTNU Speech Processing Course

Speech Processing

Spring 2016
9:10 ~12:10 am, Mondays
Instructor: Dr. Berlin Chen (陳柏琳)

Topic List and Schedule:

02/22 Course Overview & Introduction Readings: 1. F. Jelinek, The Speech Recognition Problem, Chapter 1 of the book "Statistical Methods for Speech Recognition."
               2. L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003.
                 3. S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013.
                  4. Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494.
                  5. X. Huang, J. Baker, R. Reddy, "A Historical Perspective of Speech Recognition," ACM Communications, Vol. 57, No. 1, 2014.

03/07 　 Hidden Markov Models for Speech Recognition Readings: L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,”
                                   Proceedings of the IEEE, vol. 77, No. 2, February 1989

03/14 　 Hidden Markov Models for Speech Recognition HW#1: Hidden Markov Models (Forward/Backward and Viterbi Algorithms)

03/21 　 Maximum Likelihood Estimation HW#2: Hidden Markov Models (Model Estimation)

03/28 　 Spoken Language Structure 　

04/11 　 Acoustic Modeling 　

04/18 　 Language Modeling 　

04/25 　 Language Modeling 　

05/02 　 Search Algorithm HW#3: Isolated Word Recognition (Explanation) Keyword Spotting

05/09 　 Speech Signal Analysis Reference: Digital Signal Processing

05/16 　 Robustness 　

　　 Paper Presentations
05/23
李柏勳 (MIPRO 2012) Android application for sending SMS messages with speech recognition interface
陳佩瑄 (INDICON 2015) Isolated Word Recognition Using Neural Network
陳映文 (Interspeech 2015) Multiscale recurrent neural network based language model
江宜勳 (ICICT 2006) Speech Recognition for Disabilities People
顏必成 (ICSSE 2012) Enhancing the sub-band modulation spectra of speech features via nonnegative matrix factorization for robust speech recognition
石敬弘 (ICASSP 2010) SPARSE CODING FOR SPEECH RECOGNITION
05/30
劉慈恩 (Sensors 2016) Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition
周彥辰 (Computational Linguistics 2001) A Machine Learning Approach to Coreference Resolution of Noun Phrases
何岱璇 (Master Thesis 2014) An Android Machine Learning Malware Detection System Using the Result of Static Analysis and Dynamic Analysis as the Features
李育霖 (Master Thesis 2002) The Speech Recognition System Using Neural Networks
萬世澤 (Master Thesis 2015) Using Convolutional Neural Networks for Image Retrieval
邱琬琇 (SIIE 2015) Neural Networks for Proper Name Retrieval in the Framework of Automatic Speech Recognition
徐志廷 (Speech Communication 2016) Effect of processing-based and microphone-based noise reduction algorithms on intelligibility-related acoustic features: A parametric investigation study
06/06
蔡淳伊 (ICASSP 2012) Power-Normalized Cepstral Coefficients (PNCC) for robust speech
徐品翰漫談馬可夫
簡少凡 (IEEE TSAP 1996) Predicting Unseen Triphones with Senones
吳佳樺 (IEEE TASLP 2014) A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks
莊昀諺 (ICASSP 2013) RECENT ADVANCES IN DEEP LEARNING FOR SPEECH RESEARCH AT MICROSOFT
陳志昇 (LNCS 2004) Gaussian Processes in Machine Learning
06/13
李佳謙 (IEEE PARMI 2016) Fast Edge Detection Using Structured Forests
鄭瑜      (Master Thesis 2015) Noise Robust Speech Recognition using Sparse Representations
王世安 (Nature 2015) Deep Learning
李育瑋 (ICASSP 2013) ATTRIBUTING MODELLING ERRORS IN HMM SYNTHESIS BY STEPPING GRADUALLY FROM NATURAL TO MODELLED SPEECH
鄭力文 (Marketing Science 2004) Modeling Online Browsing and Path Analysis Using Clickstream Data
姚奮辰 (2011) 使用機器學習理論建構遊戲中非玩家角色之情緒變化
　

　　 Some Representation Learning Approaches for Speech Recognition and its Applications 　

Reference Books:

§ 　 L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing, Pearson, 2011 　

§ 　 X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001 　

§ 　 Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (ed.), Springer Handbook of Speech Processing, Springer, 2007 　

§ 　 Tuomas Virtanen, Rita Singh, Bhiksha Raj (ed.), Techniques for Noise Robustness in Automatic Speech Recognition, John Wiley & Sons, 2013 　

§ 　 L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993 　

§ 　 M.J.F. Gales and S.J. Young. The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing, 2008 　

§ 　 L. Rabiner and R.W. Schafer. Introduction to Digital Speech Processing. Foundations and Trends in Signal Processing, 2007 　

§ 　 W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003 　

§ 　 S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk" 　

§ 　 T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002 　

§ 　 F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999 　

§ 　 Dong Yu and Li Deng, "Automatic Speech Recognition: A Deep Learning Approach," Springer, 2015 　

§ 　 J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000 　

§ 　 C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999 　

§ 　 J. Bellegarda, Latent Semantic Mapping: Principles & Applications (Synthesis Lectures on Speech and Audio Processing), 2008 　

§ 　 T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007 　

§ 　 Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004 　

§ 　 D. P. Bertsekas, J. N. Tsitsiklis, Introduction to Probability, Athena Scientific, 2002 　

§ 　 G. McLachlan, T. Krishnan, The EM Algorithm and Extensons, 2nd Edition, Wiley, 2008 　

Reference Papers:

§ 　 L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003. 　

§ 　 S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013. 　

§ 　 Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, 521, pp. 436-444, 2015 　

§ 　 J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 1, IEEE Signal Processing Magazine 25(3), May 2009. 　

§ 　 J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 2, IEEE Signal Processing Magazine 25(4), July 2009. 　

§ 　 J. Schalkwyk et al., "Google Search by Voice: A case study," 2010. 　

§ 　 M. Ostendorf, Speech Technology and Information Access, IEEE Signal Processing Magazine 25(3), May 2008. 　

§ 　 L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989 　

§ 　 A. V. Oppenheim and R. W. Schafer, "From Frequency to Quefrency: A History of the Cepstrum," IEEE Signal Processing Magazine 21(5), September 2004. 　

§ 　 A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, 1977 　

§ 　 J. A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021 　

§ 　 J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247 　

§ 　 R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from Here?,” Proceedings of IEEE, August, 2000 　

§ 　 H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000 　

§ 　 Aubert, X. L., "An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition," Computer Speech and Language, vol. 16, 2002, pp. 89-114. 　

§ 　 Hynek Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), 1998. 　

§ 　 Hynek Hermansky, "Speech recognition from spectral dynamics", Sadhana, 36(5), 2011. 　

§ 　 J. R. Bellegarda, "Statistical Language Model Adaptation: Review and Perspectives," Speech Communication, vol. 42, no.1, pp. 93-108, 2004. 　

§ 　 B. Roark, "A survey of discriminative language modeling approaches for large vocabulary continuous speech recognition," in Large Margin and Kernel Approaches to Speech and Speaker Recognition, J. Keshet and S. Bengio (Eds.), Wiley, 2009. 　

§ 　 L. Rabiner, B.H. Juang, "Speech Recognition: Statistical Methods," Encyclopedia of Language & Linguistics, pp. 1-18, 2006. 　

§ 　 P. Nguyen, "TechWare: Speech recognition software and resources on the web," IEEE Signal Processing Magazine 25(3), May 2009. 　

§ 　 J. B. Allen, F. Li, "Speech Perception and Cochlear Signal Processing," IEEE Signal Processing Magazine 25(4), July 2009. 　

§ 　 A. Orlitsky, N. P. Santhanam, J. Zhang, "Always Good Turing: Asymptotically Optimal Probability Estimation," Science, 17 October 2003. 　

○ 　 Proceedings of IEEE 88(8), August, 2000 (Special Issue on Spoken Language Processing) 　

§ 　 Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494. 　

§ 　 X. Huang, J. Baker, R. Reddy, "A Historical Perspective of Speech Recognition," ACM Communications, Vol. 57, No. 1, 2014. 　

§ 　 L. Deng and X. Li, "Machine learning paradigms for speech recognition: An overview," IEEE Transactions on Audio, Speech, and Language Processing, 21(5), pp. 1060 - 1089, May, 2013. 　

§ 　 H. Li, B. Ma and K. A. Lee, "Spoken Language Recognition: From Fundamentals to Practice," Proceedings of the IEEE, February 2013. 　

○ 　 IEEE Signal Processing Magazine 22(5), September 2005 (Special Issue on Speech Technology and Systems in Human-Machine Communication) 　

○ 　 IEEE Signal Processing Magazine 25(3), May 2008 (Special Issue on Spoken Language Technology) 　

○ 　 IEEE Signal Processing Magazine 29(6), December 2012 (Special Issue on Fundamental Technologies in Modern Speech Recognition) 　

○ 　 Proceedings of IEEE 101(5), May 2013 (Special Issue on Speech Information Processing: Theory and Applications) 　

Reference Presentations/Web Pages:

§ 　 J. Droppo, Noise Robust Automatic Speech Recognition, a comprehensive tutorial talk given at EUSIPCO 2008 　

§ 　 B. Chen, Latent Semantic Approaches for Information Retrieval and Language Modeling, a talk given at Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., 2008 　

§ 　 B. Chen, Recent Developments in Chinese Spoken Document Search and Distillation, a talk given at Google Taipei, 2009 　

§ 　 S. Chen, D. Beeferman, R. Rosenfeld, Evaluation metrics for language models, NIST 　

02/22	Course Overview & Introduction	Readings: 1. F. Jelinek, The Speech Recognition Problem, Chapter 1 of the book "Statistical Methods for Speech Recognition." 2. L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003. 3. S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013. 4. Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494. 5. X. Huang, J. Baker, R. Reddy, "A Historical Perspective of Speech Recognition," ACM Communications, Vol. 57, No. 1, 2014.
03/07	Hidden Markov Models for Speech Recognition	Readings: L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989
03/14	Hidden Markov Models for Speech Recognition	HW#1: Hidden Markov Models (Forward/Backward and Viterbi Algorithms)
03/21	Maximum Likelihood Estimation	HW#2: Hidden Markov Models (Model Estimation)
03/28	Spoken Language Structure
04/11	Acoustic Modeling
04/18	Language Modeling
04/25	Language Modeling
05/02	Search Algorithm	HW#3: Isolated Word Recognition (Explanation) Keyword Spotting
05/09	Speech Signal Analysis	Reference: Digital Signal Processing
05/16	Robustness
	Paper Presentations 05/23 李柏勳 (MIPRO 2012) Android application for sending SMS messages with speech recognition interface 陳佩瑄 (INDICON 2015) Isolated Word Recognition Using Neural Network 陳映文 (Interspeech 2015) Multiscale recurrent neural network based language model 江宜勳 (ICICT 2006) Speech Recognition for Disabilities People 顏必成 (ICSSE 2012) Enhancing the sub-band modulation spectra of speech features via nonnegative matrix factorization for robust speech recognition 石敬弘 (ICASSP 2010) SPARSE CODING FOR SPEECH RECOGNITION 05/30 劉慈恩 (Sensors 2016) Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition 周彥辰 (Computational Linguistics 2001) A Machine Learning Approach to Coreference Resolution of Noun Phrases 何岱璇 (Master Thesis 2014) An Android Machine Learning Malware Detection System Using the Result of Static Analysis and Dynamic Analysis as the Features 李育霖 (Master Thesis 2002) The Speech Recognition System Using Neural Networks 萬世澤 (Master Thesis 2015) Using Convolutional Neural Networks for Image Retrieval 邱琬琇 (SIIE 2015) Neural Networks for Proper Name Retrieval in the Framework of Automatic Speech Recognition 徐志廷 (Speech Communication 2016) Effect of processing-based and microphone-based noise reduction algorithms on intelligibility-related acoustic features: A parametric investigation study 06/06 蔡淳伊 (ICASSP 2012) Power-Normalized Cepstral Coefficients (PNCC) for robust speech 徐品翰漫談馬可夫簡少凡 (IEEE TSAP 1996) Predicting Unseen Triphones with Senones 吳佳樺 (IEEE TASLP 2014) A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks 莊昀諺 (ICASSP 2013) RECENT ADVANCES IN DEEP LEARNING FOR SPEECH RESEARCH AT MICROSOFT 陳志昇 (LNCS 2004) Gaussian Processes in Machine Learning 06/13 李佳謙 (IEEE PARMI 2016) Fast Edge Detection Using Structured Forests 鄭瑜 (Master Thesis 2015) Noise Robust Speech Recognition using Sparse Representations 王世安 (Nature 2015) Deep Learning 李育瑋 (ICASSP 2013) ATTRIBUTING MODELLING ERRORS IN HMM SYNTHESIS BY STEPPING GRADUALLY FROM NATURAL TO MODELLED SPEECH 鄭力文 (Marketing Science 2004) Modeling Online Browsing and Path Analysis Using Clickstream Data 姚奮辰 (2011) 使用機器學習理論建構遊戲中非玩家角色之情緒變化
	Some Representation Learning Approaches for Speech Recognition and its Applications

§		L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing, Pearson, 2011
§		X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall, 2001
§		Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (ed.), Springer Handbook of Speech Processing, Springer, 2007
§		Tuomas Virtanen, Rita Singh, Bhiksha Raj (ed.), Techniques for Noise Robustness in Automatic Speech Recognition, John Wiley & Sons, 2013
§		L. Rabiner, B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall, 1993
§		M.J.F. Gales and S.J. Young. The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing, 2008
§		L. Rabiner and R.W. Schafer. Introduction to Digital Speech Processing. Foundations and Trends in Signal Processing, 2007
§		W. Chou,. B.H. Juang. Pattern Recognition in Speech and Language Processing. CRC Press, 2003
§		S. Young et al., “The HTK Book”, Version 3.2, 2002. "http://htk.eng.cam.ac.uk"
§		T. F. Quatieri,“Discrete-Time Speech Signal Processing - Principles and Practice,” Prentice Hall, 2002
§		F. Jelinek, "Statistical Methods for Speech Recognition," The MIT Press, 1999
§		Dong Yu and Li Deng, "Automatic Speech Recognition: A Deep Learning Approach," Springer, 2015
§		J. R. Deller, J. H. L. Hansen, J. G. Proakis, “Discrete-Time Processing of Speech Signals,” IEEE Press, 2000
§		C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999
§		J. Bellegarda, Latent Semantic Mapping: Principles & Applications (Synthesis Lectures on Speech and Audio Processing), 2008
§		T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007
§		Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004
§		D. P. Bertsekas, J. N. Tsitsiklis, Introduction to Probability, Athena Scientific, 2002
§		G. McLachlan, T. Krishnan, The EM Algorithm and Extensons, 2nd Edition, Wiley, 2008

§		L. Rabiner. The Power of Speech. Science, Vol. 301, pp. 1494-1495, Sep. 2003.
§		S. Young. "Talking to Machines," Royal Academy of Engineering Ingenia, 54, pp. 40-46, 2013.
§		Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, 521, pp. 436-444, 2015
§		J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 1, IEEE Signal Processing Magazine 25(3), May 2009.
§		J.M. Baker et al., Research Developments and directions in speech recognition and understanding, part 2, IEEE Signal Processing Magazine 25(4), July 2009.
§		J. Schalkwyk et al., "Google Search by Voice: A case study," 2010.
§		M. Ostendorf, Speech Technology and Information Access, IEEE Signal Processing Magazine 25(3), May 2008.
§		L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, February 1989
§		A. V. Oppenheim and R. W. Schafer, "From Frequency to Quefrency: A History of the Cepstrum," IEEE Signal Processing Magazine 21(5), September 2004.
§		A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, 1977
§		J. A. Bilmes "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," U.C. Berkeley TR-97-021
§		J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247
§		R. Rosenfeld, ”Two Decades of Statistical Language Modeling: Where Do We Go from Here?,” Proceedings of IEEE, August, 2000
§		H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, August 2000
§		Aubert, X. L., "An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition," Computer Speech and Language, vol. 16, 2002, pp. 89-114.
§		Hynek Hermansky, "Should Recognizers Have Ears?", Speech Communication, 25(1-3), 1998.
§		Hynek Hermansky, "Speech recognition from spectral dynamics", Sadhana, 36(5), 2011.
§		J. R. Bellegarda, "Statistical Language Model Adaptation: Review and Perspectives," Speech Communication, vol. 42, no.1, pp. 93-108, 2004.
§		B. Roark, "A survey of discriminative language modeling approaches for large vocabulary continuous speech recognition," in Large Margin and Kernel Approaches to Speech and Speaker Recognition, J. Keshet and S. Bengio (Eds.), Wiley, 2009.
§		L. Rabiner, B.H. Juang, "Speech Recognition: Statistical Methods," Encyclopedia of Language & Linguistics, pp. 1-18, 2006.
§		P. Nguyen, "TechWare: Speech recognition software and resources on the web," IEEE Signal Processing Magazine 25(3), May 2009.
§		J. B. Allen, F. Li, "Speech Perception and Cochlear Signal Processing," IEEE Signal Processing Magazine 25(4), July 2009.
§		A. Orlitsky, N. P. Santhanam, J. Zhang, "Always Good Turing: Asymptotically Optimal Probability Estimation," Science, 17 October 2003.
○		Proceedings of IEEE 88(8), August, 2000 (Special Issue on Spoken Language Processing)
§		Frederick Jelinek, "The Dawn of Statistical ASR and MT," Computational Linguistics, Vol. 35, No. 4. (1 December 2009), pp. 483-494.
§		X. Huang, J. Baker, R. Reddy, "A Historical Perspective of Speech Recognition," ACM Communications, Vol. 57, No. 1, 2014.
§		L. Deng and X. Li, "Machine learning paradigms for speech recognition: An overview," IEEE Transactions on Audio, Speech, and Language Processing, 21(5), pp. 1060 - 1089, May, 2013.
§		H. Li, B. Ma and K. A. Lee, "Spoken Language Recognition: From Fundamentals to Practice," Proceedings of the IEEE, February 2013.
○		IEEE Signal Processing Magazine 22(5), September 2005 (Special Issue on Speech Technology and Systems in Human-Machine Communication)
○		IEEE Signal Processing Magazine 25(3), May 2008 (Special Issue on Spoken Language Technology)
○		IEEE Signal Processing Magazine 29(6), December 2012 (Special Issue on Fundamental Technologies in Modern Speech Recognition)
○		Proceedings of IEEE 101(5), May 2013 (Special Issue on Speech Information Processing: Theory and Applications)

§		J. Droppo, Noise Robust Automatic Speech Recognition, a comprehensive tutorial talk given at EUSIPCO 2008
§		B. Chen, Latent Semantic Approaches for Information Retrieval and Language Modeling, a talk given at Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., 2008
§		B. Chen, Recent Developments in Chinese Spoken Document Search and Distillation, a talk given at Google Taipei, 2009
§		S. Chen, D. Beeferman, R. Rosenfeld, Evaluation metrics for language models, NIST