טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentShtok Anna
SubjectNew Approaches for Query-Performance Prediction
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Oren Kurland
Full Thesis textFull thesis text - English Version


Abstract

Nowadays search systems are essential tools that allow us to find relevant information in digital repositories. However, current search systems suffer from radical variability of performance with respect to different queries. Therefore, the ability to identify queries that yield poor retrieval effectiveness is of utmost importance.

This thesis is focused on the task of query performance prediction - i.e., estimation of the effectiveness of a search performed in response to a query in lack of relevance judgments. Our contributions are twofold. First, we propose novel, high quality, post-retrieval query performance prediction methods. Secondly, we develop formal models and frameworks that shed light on existing prediction methods and their relations, and give rise to novel prediction approaches.

Our first contribution is an efficient query-performance predictor that is based on analyzing the retrieval-scores distribution of documents in the result list. Specifically, the predictor is based on estimating the presumed amount of query drift in the list, that is, the presence (and dominance) of aspects that are not related to the query.

We then present a novel general framework to query-performance prediction that uses principles of statistical decision theory.  The task of query-performance prediction is formulated in terms of utility estimation; specifically the utility of a document ranking with respect to the underlying information need. To address the uncertainty in inferring the information need, we estimate utility by the expected similarity between the given ranking and those induced by relevance language models; the impact of a relevance model is based on its presumed representativeness of the information need.

Our last contribution is a formal probabilistic analysis for the query-performance prediction task. The analysis gives rise to a general prediction framework that uses pseudo effective or ineffective document lists that are retrieved in response to the query. These lists serve as reference to the result list, the effectiveness of which we want to predict. We show that many previously proposed predictors, which might seem very different, can be actually derived from our framework thereby providing formal common grounds to these methods. Additionally, we demonstrate the connection between prediction using reference lists and fusion-based retrieval. Extensive empirical exploration provides support to the principles of the framework.

The query-performance predictors presented in this thesis are highly effective and often outperform state-of-the-art prediction methods.