טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentMarkovits Gad
SubjectPredicting Query Performance for Fusion-Based Retrieval
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Oren Kurland
Full Thesis textFull thesis text - English Version


Abstract

In a world where the amount of information is rapidly growing, search engines have become a vital part of our lives. Be it search over the World Wide Web or a company's database, finding documents relevant to the user's query, in light of this information growth, is an increasingly difficult task.

The main function of search engines is to rank a corpus of documents, by their presumed relevance to a user's query, and return the result list - a list comprised of the top-ranked documents. However, for some queries a search engine might return many relevant documents while for other queries it might return a few or none at all, even though such documents exist in the corpus. Therefore, developing a method which can detect inconsistent behavior of search engines and identify the more difficult queries is of much interest.

Estimating the effectiveness of a search performed in response to a query in the absence of relevance judgments is the goal of query-performance prediction methods. Post-retrieval predictors, for example, analyze the result list of the most highly ranked documents.

We address the prediction challenge for retrieval approaches wherein the final result list is produced by fusing document lists that were retrieved in response to a query. To that end, we present a novel fundamental prediction framework that accounts for this special characteristics of the fusion setting; i.e., the use of intermediate retrieved lists. The framework is based on integrating prediction performed upon the final result list with that performed upon the lists that were fused to create it; prediction integration is controlled based on inter-list similarities.

We empirically demonstrate the merits of various predictors instantiated from the framework. A case in point, the prediction quality substantially transcends that of applying (state-of-the-art) predictors upon the final result list.