טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentDmitry Pavlov
SubjectA Graph-Based Approach to Utilizing Minimal Relevance
Feedback
DepartmentDepartment of Industrial Engineering and Management
Supervisor Full Professor Kurland Oren
Full Thesis textFull thesis text - English Version


Abstract

The ad hoc retrieval task is ranking documents in a corpus by their relevance to the information need expressed by a query. This challenging task can be alleviated if relevance feedback is available and utilized; specifically, examples of documents relevant to the information need.

The most common approach to utilizing (positive) relevance feedback is constructing an expanded query form based on the commonalities between the given relevant documents. However, relevance feedback, if available, is often scarce. Thus, it is a common case that only a single relevant document is available as feedback and commonalities between relevant documents cannot be exploited. This is the retrieval setting we address here: using the query and a single relevant document to construct an expanded query form to be used for (a second) retrieval.

We present a graph-based approach to selecting pseudo relevant documents from an initially retrieved list. These documents are used in addition to the given relevant document so as to construct an expanded query form. The motivation is to alleviate the potential query drift manifested in the expanded query form by using an enriched signal about the underlying information need. Our approach helps to leverage information about the similarities among pseudo relevant documents, between these documents and the given relevant document, along with query similarities of the pseudo relevant documents. Specifically, a pseudo relevant document is considered to be effective for query expansion if it is similar to the query, to the given relevant document, and to other effective pseudo relevant documents.

Empirical evaluation demonstrates the merits of our approach. Specifically, the resultant retrieval performance is better than that of using only the given relevant document for query expansion. The performance also transcends that of using for query expansion the given relevant document with pseudo relevant documents that are highly ranked in the initially retrieved list.