טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentRom Ofri
SubjectExploration of Advanced Methods for Pseudo-Feedback-based
Retrieval
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Oren Kurland
Full Thesis textFull thesis text - English Version


Abstract

The core task that search engines have to address is ad hoc retrieval. That is, ranking documents in a corpus by their relevance to the information need expressed by a query. As queries are often short, the ad hoc retrieval task becomes quite difficult. As a case in point, relevant documents might not contain some (or even all) query terms, a problem known as vocabulary mismatch.
To address the vocabulary mismatch problem, among others, automatic query expansion methods have been proposed. The most common query expansion paradigm is based on utilizing pseudo feedback; i.e., the documents most highly ranked by an initial search.
Most pseudo-feedback-based query expansion methods are based on selecting expansion terms independently of each other. We examine an alternative approach that uses clusters of ``similar'' terms for query expansion. Specifically, we present methods for ranking term clusters based on their presumed effectiveness for query expansion. Empirical evaluation shows that while there are term clusters that can much benefit query expansion, automatically identifying them is a very hard task.
Our second contribution tackles the common practice in work on pseudo-feedback-based query expansion; namely, using a single initially retrieved document list as the pseudo feedback set. We present a few approaches that utilize multiple initially retrieved lists. Extensive empirical evaluation demonstrates the clear merits of our approaches.