Information Retrieval and Extraction
Fall 2005
Tuesdays, 2:10 ~5:00 PM
Instructor:
Berlin Chen (陳柏琳 助理教授)

Homework Webpage

Tentative Topic List and Schedule:

9/13
 
Course Overview & Introduction
 
 
9/20
 
Retrieval Models (I) - Classic Retrieval Models (Boolean, Vector Space and Probabilistic Models)
 
 
9/27
 
Retrieval Performance Evaluation (I) - Measures
 
HW-01:IR Performance Evaluation
Due 10/11
 
10/4
 
Retrieval Performance Evaluation (II) - Reference Collections
 
 
10/11
 
Retrieval Models (II) - Improved Approaches (Fuzzy Set, Extended Boolean, Generalized Vector Space Models)
 
10/18
 
Query Operations (Query Expansion and Term Re-weighting)
 
HW-02:IR Models and Query Reformulations
Due 11/8
10/25
 
Retrieval Models (III) - Statistical Modeling Approaches (HMM/N-Gram: Language Model Approach )
 
11/1
 
Retrieval Models (III) - Statistical Modeling Approaches (TMM: Topical Mixture Model)
 
11/8
 
Retrieval Models (III) - Statistical Modeling Approaches (LSA, PLSA) &  LSA Toolkit

Relevance Models (
Preliminary)
HW-03:LSI Retrieval Model
Due 11/29
 
11/15
 
Midterm
 
11/22
 
Text Clustering 
 
11/29
 
Retrieval Models (IV) - Structural Retrieval Models and Browsing Models
 
12/6
 
Query Languages, Text Statistics
 
12/13
 
Text Operations
 
12/20
 
Invited Talk, 陳俊良先生 (新視科技總經理)
Information Retrieval & Digital Archive Management
12/27
 
Paper Survey (I)
陳鴻彬:Simplified Similarity Scoring Using Term Ranks (SIGIR2005)
許庭瑋:When Will Information Retrieval Be “Good Enough”? (SIGIR2005)
李家豪:Dependence Language Model for Information Retrieval (SIGIR2004)
 
1/3
 
Paper Survey (II)
朱芳輝:Gravitation-Based Model for Information Retrieval (SIGIR2005)
白聖秋:Indexing and Ranking in Geo-IR Systems 
張日青:The Maximum Entropy Method for Analyzing Retrieval Measuring
徐志文:Exploiting the Hierarchical Structure for Link Analysis (SIGIR2005)
游斯涵:MultiLabel Informed Latent Semantic Indexing  (SIGIR2005)
林士翔:Relevance Information: A Loss of Entropy but a Gain for IDF? (SIGIR2005)
starting from 1:00 pm
1/10
 
Indexing and Searching
 
1/17
 
Final
 
Chinese Spoken Document Recognition, Organization and Retrieval

Textbook: 

1.
 
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley Longman, 1999.
 
2.
 
W.B. Croft and J. Lafferty (eds), Language Models for Information Retrieval, Kluwer International Series on Information Retrieval, Volume 13, Kluwer Academic Publishers, 2002.

References:
 
Books:

1. W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures & Algorithms,  Prentice-Hall, 1992.
2. A. D. Bimbo, "Visual Information Retrieval", Morgan Kaufmann, 1999.
3.
 
 I. H. Witten, A. Moffat, and T. C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishing, 1999.
4. C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
5. D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000.

Papers:

1. D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet allocation,"  Journal of Machine Learning Research, 3:993-1022, January 2003.
2. V. Lavrenko and W.B. Croft, "Relevance-Based Language Models"  ACM SIGIR 2001.
3. C. H. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, "Latent semantic indexing: A probabilistic analysis,'' analyzes an information retrieval technique related to principle components analysis.
4. Liu, X. and Croft, W.B., "Statistical Language Modeling For Information Retrieval,"  the Annual Review of Information Science and Technology, vol. 39, 2005
5. Lan Huang. A Survey On Web Information Retrieval Technologies. 2000.