|M.Sc Thesis||Department of Industrial Engineering and Management|
|Supervisor:||Dr. Kurland Oren|
|Full Thesis text|
Pseudo Relevance Feedback (PRF) provides a method for automatic local analysis. It automates the manual part of relevance feedback, so that the user gets improved retrieval performance without an extended interaction. Query expansion using PRF, was shown to be a beneficial method for improving the effectiveness of ad hoc information retrieval. The main challenge of this approach is to find a good set of terms for the expansion. Most of the previous studies make a term independence assumption. They construct the expansion set by using terms that appear with high frequency in documents that are initially highly ranked. We propose an approach to selecting terms for expansion that is based on utilizing inter-term similarities. Specifically, we use information induced from clusters of similar terms. As in some prior work we demonstrate empirically that there are clusters of terms that if used for expansion, then the resultant retrieval performance surpasses that of current state-of-the-art expansion approaches. We also experiment with some approaches to identifying these clusters and demonstrate the challenges embodied in this task.