טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentLevi Or
SubjectSelective Cluster-Based Document Retrieval
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Oren Kurland
Full Thesis textFull thesis text - English Version


Abstract

We address the long standing challenge of selective cluster-based retrieval; namely, deciding on a per-query basis whether to apply cluster-based document retrieval or standard document retrieval.

Previous work using TREC data has demonstrated the merits of selective cluster-based retrieval. We further motivate the pursuit of this task using analysis of the search log of a large-scale commercial search engine. Specifically, we study the effects of presenting on the search engine results page (SERP), results from a vertical which includes documents from a specialized domain of community question answering (CQA) websites. Focusing on the CQA vertical, we compare the characteristics of its results on the SERP, which has received little attention in previous work, to those retrieved from other verticals by different aspects, such as size, rank, and clicks, and suggest a variety of unique characteristics that make the CQA vertical especially interesting for cluster-based retrieval research.

To address the selective cluster-based retrieval challenge, we propose a few sets of features based on those utilized by the cluster-based ranker, query-performance predictors, and properties of the clustering structure. Empirical evaluation shows that our method outperforms state-of-the-art retrieval approaches, including cluster-based, query expansion, and term proximity methods.