|M.Sc Student||Kuzi Saar|
|Subject||Discriminative Query Models|
|Department||Department of Industrial Engineering and Management||Supervisor||Professor Oren Kurland|
Search engines are crucial tools nowadays, given the plethora
of data available, as they facilitate the extraction of relevant information from
large collections of digital data. The ad hoc retrieval task, performed by
search engines, is the focus of this work. The task is to rank the documents in
a collection by their relevance to an information need represented by the
user's query. Modeling the presumed information need of the user is one of the
challenges of the ad hoc retrieval task. To that end, different query models
are often induced using the original query.
Queries in many cases do not represent the information need of the user effectively.
A case in point, queries tend to be short and hence there
might be a vocabulary mismatch between them and the relevant documents.
Consequently, using a query model that relies solely on the user's query may
result in poor retrieval performance. Various query models were devised with
the goal of serving as a more effective representation of the information need.
Pseudo-feedback-based query models are induced from a result list of the documents most highly ranked by initial search performed for the query. The underlying assumption is that high ranked documents are more likely to be relevant to the query than low ranked ones. Pseudo-feedback-based query models may bridge the vocabulary mismatch by assigning high importance to terms that are presumably related to the information need.
However, since the result list often contains much non-relevant information, the induced query model may drift away from the information need . Hence, various techniques, often referred to as query anchoring, are used for ameliorating potential query drift. For instance, interpolating the query model with a model of the original query is a common query anchoring practice.
We present a novel unsupervised discriminative query model that can be used, by several methods proposed herein, for query anchoring of existing query models. The model is induced from the result list using a learning-to-rank approach and constitutes a discriminative term-based representation of the initial ranking. We show that applying our methods to effective generative query models can improve retrieval performance.