|M.Sc Student||Rom Ofri|
|Subject||Exploration of Advanced Methods for Pseudo-Feedback-based|
|Department||Department of Industrial Engineering and Management||Supervisor||PROF. Oren Kurland|
|Full Thesis text|
The core task that search engines have to address is ad
hoc retrieval. That is, ranking documents in a corpus by their relevance to the
information need expressed by a query. As queries are often short, the ad hoc
retrieval task becomes quite difficult. As a case in point, relevant documents
might not contain some (or even all) query terms, a problem known as vocabulary
To address the vocabulary mismatch problem, among others, automatic query expansion methods have been proposed. The most common query expansion paradigm is based on utilizing pseudo feedback; i.e., the documents most highly ranked by an initial search.
Most pseudo-feedback-based query expansion methods are based on selecting expansion terms independently of each other. We examine an alternative approach that uses clusters of ``similar'' terms for query expansion. Specifically, we present methods for ranking term clusters based on their presumed effectiveness for query expansion. Empirical evaluation shows that while there are term clusters that can much benefit query expansion, automatically identifying them is a very hard task.
Our second contribution tackles the common practice in work on pseudo-feedback-based query expansion; namely, using a single initially retrieved document list as the pseudo feedback set. We present a few approaches that utilize multiple initially retrieved lists. Extensive empirical evaluation demonstrates the clear merits of our approaches.