טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentBernstein Kfir
SubjectCluster-Based Document Retrieval with Multiple Queries
DepartmentDepartment of Industrial Engineering and Management
Supervisor PROF. Oren Kurland


Abstract

The merits of using multiple queries representing the same information need to improve retrieval effectiveness have recently been demonstrated in several information retrieval studies. In this thesis we present the first study of utilizing multiple queries in the cluster-based document retrieval domain; that is, using information induced from clusters of similar documents to rank documents in order to improve the retrieval results. Specifically, we propose a conceptual framework of retrieval templates that can adapt known cluster-based document retrieval methods, originally devised for a

single query, to leverage multiple queries. The adaptations operate at the query, document list and similarity-estimate levels. Retrieval methods are instantiated from the templates by selecting, for example, the clustering algorithm and the cluster-based document retrieval method. Empirical evaluation attests to the merits of the retrieval templates with respect to very strong baselines: state-of-the-art cluster-based document retrieval method with a single query and highly effective fusion of document lists retrieved for multiple queries. In addition, we present findings about the impact

of the effectiveness of queries used to represent an information need on (i) cluster hypothesis test results, (ii) percentage of relevant documents in clusters of similar documents, and (iii) effectiveness of state-of-the-art cluster-based document retrieval

methods.