טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentKuzi Saar
SubjectDiscriminative Query Models
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Oren Kurland


Abstract

Search engines are crucial tools nowadays, given the plethora of data available, as they facilitate the extraction of relevant information from large collections of digital data. The ad hoc retrieval task, performed by search engines, is the focus of this work. The task is to rank the documents in a collection by their relevance to an information need represented by the user's query. Modeling the presumed information need of the user is one of the challenges of the ad hoc retrieval task. To that end, different query models are often induced using the original query.
Queries in many cases do not represent the information need of the user effectively.

A case in point, queries tend to be short and hence there might be a vocabulary mismatch between them and the relevant documents. Consequently, using a query model that relies solely on the user's query may result in poor retrieval performance. Various query models were devised with the goal of serving as a more effective representation of the information need.
Pseudo-feedback-based query models are induced from a result list of the documents most highly ranked by initial search performed for the query. The underlying assumption is that high ranked documents are more likely to be relevant to the query than low ranked ones. Pseudo-feedback-based query models may bridge the vocabulary mismatch by assigning high importance to terms that are presumably related to the information need.
However, since the result list often contains much non-relevant information, the induced query model may drift away from the information need . Hence, various techniques, often referred to as query anchoring, are used for ameliorating potential query drift. For instance, interpolating the query model with a model of the original query is a common query anchoring practice.
We present a novel unsupervised discriminative query model that can be used, by several methods proposed herein, for query anchoring of existing query models. The model is induced from the result list using a learning-to-rank approach and constitutes a discriminative term-based representation of the initial ranking. We show that applying our methods to effective generative query models can improve retrieval performance.