טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentMor Sondak
SubjectEstimating Query Representativeness for Query-Performance
Prediction
DepartmentDepartment of Industrial Engineering and Management
Supervisor Full Professor Kurland Oren
Full Thesis textFull thesis text - English Version


Abstract

The query-performance prediction (QPP) task is estimating retrieval effectiveness with no relevance judgments. We present a novel probabilistic framework for QPP that gives rise to an important aspect that was not addressed in previous work; namely, the extent to which the query effectively represents the underlying information need for retrieval over the given corpus. We present a few query-representativeness measures that serve as estimates for this extent. The representativeness measures utilize relevance language models that serve as explicit representation of the presumed information need. The probabilistic prediction framework that we present also provides formal grounds for integrating query-representativeness measures with pre- and post-retrieval prediction methods. In addition, we show how to integrate the query-representativeness measures in a recently proposed framework for QPP which utilizes Markov Random Fields. Results obtained using a wide array of experiments attest to the merits of using the proposed query-representativeness measures. A case in point, integrating the representativeness measures with state-of-the-art pre-retrieval prediction methods yields prediction quality that substantially transcends that of using the pre-retrieval predictors alone. Hence, this integration constitutes the best reported pre-retrieval prediction approach, as the representativeness measures are computed prior to performing retrieval. We also demonstrate the potential merits of integrating the query representativeness measures with both post-retrieval and pre-retrieval prediction methods.