Information Retrieval
Fall 2008
Tuesdays, 9:10 ~12:00 AM|
Instructor:
Dr. Berlin Chen (陳柏琳)

 

Homework Page

Tentative List of Topics:

09/16

Course Overview & Introduction

 

09/23

 

Break (Interspeech 2008)

 
09/30   Retrieval Models (I) - Classic Retrieval Models (Boolean, Vector Space and Probabilistic Models)  

10/07

Retrieval Performance Evaluation - Measures
Retrieval Performance Evaluation - Collections

HW-1: Evaluations for IR (Due10/28)

10/14

Retrieval Models (II) - Improved Approaches (Fuzzy Set, Extended Boolean, Generalized Vector Space Models)

10/21

Query Operations (Query Expansion and Term Re-weighting)

10/28

Retrieval Models (III) - Latent Semantic Analysis (LSA)

11/04

Retrieval Models (IV) - Language Modeling Approaches (1/2)

HW-2: Retrieval Models (Due11/25)

11/10 Retrieval Models (IV) - Language Modeling Approaches (2/2)
11/18 Midterm (9:00~12:00 a.m.)

11/25

Clustering for Information Retrieval

12/02 Efficient Indexing and Searching
12/09

Text and Speech Summarization

HW-3: Text/Speech Summarization (Due 01/13)
12/16 Break (ISCSLP 2008)

12/23

Web Search Basics and Link Analysis

12/30

Paper Survey by Students

1.  張朝凱: An information retrieval approach to concept location in source code
2.  張鈺玫: OCELOT: A system for summarizing web pages (SIGIR2000)
3.  游宗毅: A Study of Methods for Negative Relevance Feedback (SIGIR08)
4.  卓晉緯: Retrieval Models for Question and Answer Archives (SIGIR08)
5.  徐毓雯: Retrieval and Feedback Models for Blog Feed Search (SIGIR08)
6.  許翼麟: Predicting Information Seeker Satisfaction in Community Question Answering (SIGIR08)
7.  戴衣菱: Finding Question-Answer Pairs from Online Forums (SIGIR08)

01/06

8.  范喬彬: Collaborative OpenSocial Network Dataset based Email Ranking and Filtering
9.  游敦皓: Query Dependent Ranking Using K-Nearest Neighbor (SIGIR08)
10. 張嘉晏: Studying the Use of Popular Destinations to Enhance Web Search Interaction (SIGIR07)
11. 鄭舜尹: Application Potential of Multimedia Information Retrieval (IEEE Proceedings 2008)
12. 林宗緣: Ontology Based Semantic Information Retrieval (IEEE Intelligent System 2008)
13. 黃信翰: Document Instantiation for Relevance Feedback in the Bayesian Network Retrieval Model (SIGIR01 - FM/IR workshop)
14: 胡晉豪: User Adaptation: Good Results from Poor Systems (SIGIR2008)

Final Project (Due01/21)

01/13

Learning to Rank using Language Models and SVMs

Textbook: 

1.
 

R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley Longman, 1999.

2.
 

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

3.
 

D. A. Grossman, O. Frieder, Information Retrieval: Algorithms and Heuristics, Springer. 2004.
4. W. Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison Wesley, 2009

References:
 
Books:

1. W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures & Algorithms,  Prentice-Hall, 1992.
2. T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum, 2007
3.  I. H. Witten, A. Moffat, and T. C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishing, 1999.
4. C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
5. D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000.
6. W.B. Croft and J. Lafferty (eds.), Language Models for Information Retrieval, Kluwer International Series on Information Retrieval, Volume 13, Kluwer Academic Publishers, 2002.
7. C.X. Zhai, "Statistical Language Models for Information Retrieval: A Critical Review," Foundations and Trends in Information Retrieval, Vol. 2, No. 3, 2008. (Also see an extended version in: C.X. Zhai, "Statistical Language Models for Information Retrieval (Synthesis Lectures Series on Human Language Technologies)," Morgan & Claypool Publishers, 2008)

.

Papers:

1. D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet allocation,"  Journal of Machine Learning Research, 3:993-1022, January 2003.
2. V. Lavrenko and W.B. Croft, "Relevance-Based Language Models"  ACM SIGIR 2001.
3. C. H. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, "Latent semantic indexing: A probabilistic analysis,'' analyzes an information retrieval technique related to principle components analysis.
4. Liu, X. and Croft, W.B., "Statistical Language Modeling For Information Retrieval,"  the Annual Review of Information Science and Technology, vol. 39, 2005
5. Lan Huang. A Survey On Web Information Retrieval Technologies. 2000.
6. D. Hiemstra, "Information Retrieval Model," In: A. Goker, J. Davies, and M. Graham (eds.), Information Retrieval: Searching in the 21st Century, Wiley, 2009
7. M. Steyvers, T. Griffiths,  "Probabilistic Topic Models," In T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.). Handbook of Latent Semantic Analysis, Mahwah NJ: Lawrence Erlbaum, 2007.
8. X. Yi, J. Allan,  "A Comparative Study of Utilizing Topic Models for Information Retrieval," in the Proceedings of ECIR'09.
9. Nallapati, Discriminative Models for Information Retrieval, in the Proceedings of SIGIR 2004
10. T. Joachims and F. Radlinski, Search Engines that Learn from Implicit Feedback, IEEE Trans. on Computer 40(8), pp. 34-40, 2007
11. B. Chen, H.M. Wang, L.S. Lee, “A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents,” ACM Transactions on Asian Language Information Processing, Vol. 3, No. 2, pp. 128-145, June 2004.

 

Information Retrieval Resources

      1.   SIGIR-Information Retrieval Resources