טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentLepa Yevgeni
SubjectQuery Expansion Using Term Clusters
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Oren Kurland
Full Thesis textFull thesis text - English Version


Abstract


Pseudo Relevance Feedback (PRF) provides a method for automatic local analysis. It automates the manual part of relevance feedback, so that the user gets improved retrieval performance without an extended interaction. Query expansion using PRF, was shown to be a beneficial method for improving the effectiveness of ad hoc information retrieval. The main challenge of this approach is to find a good set of terms for the expansion. Most of the previous studies make a term independence assumption. They construct the expansion set by using terms that appear with high frequency in documents that are initially highly ranked. We propose an approach to selecting terms for expansion that is based on utilizing inter-term similarities. Specifically, we use information induced from clusters of similar terms. As in some prior work we demonstrate empirically that there are clusters of terms that if used for expansion, then the resultant retrieval performance surpasses that of current state-of-the-art expansion approaches. We also experiment with some approaches to identifying these clusters and demonstrate the challenges embodied in this task.