טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentRaiber Fiana
SubjectQuery-Performance Prediction and Cluster Ranking:
Two Sides of the Same Coin?
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Oren Kurland
Full Thesis textFull thesis text - English Version


Abstract

Various tasks and challenges in the information retrieval field have been independently addressed in the literature, including federated search, fusion-based retrieval, cluster ranking and query-performance prediction. Using a general probabilistic formalism, we draw novel connections between these tasks and the methods used to address them.

In this thesis, we mainly focus on the query-performance prediction and cluster ranking tasks. The first is predicting the effectiveness of retrieval performed in response to a query with no relevance judgments. The second is predicting the effectiveness of clusters created from the documents most highly ranked by some search performed in response to the query with no relevance judgments.

We present two novel approaches for query-performance prediction. The first approach utilizes query-independent document-quality measures. While using these measures was shown to improve retrieval effectiveness, we demonstrate the merits of using them for query-performance prediction. The second approach utilizes Markov Random Fields. The approach integrates previously proposed query-performance prediction methods. The prediction quality of our approaches substantially transcends that of state-of-the-art predictors.

We then present a novel cluster ranking approach that utilizes Markov Random Fields. The approach is based on integrating various types of cluster-relevance evidence in a principled manner. These include the query-similarity values of the cluster's documents, inter-document similarities within the cluster, and query-independent document-quality measures. Our cluster ranking approach significantly outperforms state-of-the-art retrieval methods.

Cluster-based document retrieval methods, including those that rely on cluster ranking, as well as other applications, are often based on estimating asymmetric co-relevance: the relevance of a document to a query given another document assumed to be relevant. We present a novel supervised model for learning an asymmetric co-relevance estimate. The model uses different types of similarities with the assumed relevant document and the query, as well as query-independent document-quality measures. We show that using the proposed estimate in several state-of-the-art retrieval methods yields significant performance improvements over a wide variety of alternative estimates.