Information Retrieval and Extraction
                          
Spring, 2003

 

Tentative Topic List and Schedule

2/21
 
Course Overview & Introduction
 
2/28
 
Break
 
3/7
 
Retrieval Models (I) - Classic Retrieval Models: Boolean, Vector Space and Probabilistic Models
 
3/14



 
Retrieval Evaluation (I) - Measures
Retrieval Evaluation (II) - Reference Collections

HW#1:
Evaluation Measures (Due 3/28)
HW#2: Classic Retrieval Models  (Due 4/11
)
 
3/21 Retrieval Models (II) - Structural Retrieval Models and Browsing Models
Retrieval Models (III) - Fuzzy Set, Extended Boolean, Generalized Vector Space Models
 
3/28
 
Query Operations (Query Expansion and Term Re-weighting)
HW#3: Relevance Feedback or Local Analysis  (Due 4/25)
4/4
 
Break
 
4/11
 
Query Operations (Query Expansion and Term Re-weighting)
 
4/18

 
Retrieval Models (IV) - HMM/N-gram-based, LSI, PLSA

HW#4: HMM/N-gram-based and PLSI Retrieval Models  (Due 5/16)
4/25
 
Midterm
 
5/2

 
Retrieval Models (IV) - HMM/N-gram-based, LSI, PLSA
Query Languages
 
5/9

 
Text Languages and Text Statistics

 
5/16






 
Text Preprocessing, Text Compression
Text Clustering Techniques  

HW#5: A Web-based IR System  (Due 6/20)
 (Features included: character overlapping bigrams as indexing terms,
  inverted file structure, query expansion, client-server networking architecture)

 
5/23

 
Indexing and Searching (Preliminary Version)

 
5/30




 
Paper Presentation (I):
黃立德: Boosting for Document Routing, ACM CIKM 2000
鄭德義:
Cross-Document Summarization by Concept Classification, SIGIR 2002
江漢昇:
Improving realism of topic tracking evaluation,SIGIR 2002
黃士傑:
Set-based model-a new approach for information retrieval, SIGIR 2002
 
6/6

 
Talk Title: "Technologies behind Internet Search Engine"
Invited Speaker: Ming-Jer Lee,  CTO, VisionNEXT Co.
 
6/13




 
Paper Presentation (II):
郭人瑋:
  Generic Summarization and Keyphrase Extraction Using
              Mutual Reinforcement Principle and Sentence Clustering, SIGIR 2002

黃耀民:
  Expressive Retrieval from XML documents, SIGIR 2001
劉耀才:
Document Clustering with Committees, SIGIR 2002
 
6/20 Text Categorization Techniques

 
6/27
 
Final Exam
 


 

Information Extraction Techniques
Question Answering

Retrieval Models (V) - Advanced Retrieval Models (|I)
(Inference Networks, Belief Networks, Neural Networks)
 

Textbook: 

1. R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison Wesley Longman, 1999.

References:
 
Books:

1. W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures & Algorithms,  Prentice-Hall, 1992.
2. A. D. Bimbo, "Visual Information Retrieval", Morgan Kaufmann, 1999.
3.
 
 I. H. Witten, A. Moffat, and T. C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishing, 1999.
4. C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
5. D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000.

  Papers:
    

Grading:
     1. Final: 20%
     2. Presentations 20%
     3. Homework: 20%
     4. Project: 25%
     5. Attendance/Other: 15%