טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentBrondwine Elinor
SubjectUtilizing Focused Relevance Feedback for Ad Hoc
Retrieval
DepartmentDepartment of Industrial Engineering and Management
Supervisor PROF. Oren Kurland
Full Thesis textFull thesis text - English Version


Abstract

The ad hoc retrieval task is ranking documents in a corpus by their presumed relevance to information need expressed by a (usually short) query. It is a well-known fact that if examples of relevant documents are available and used, retrieval effectiveness can be significantly improved. Consequently, there has been a large body of work on utilizing document relevance feedback.


In contrast, there has been little work on using relevance feedback provided for passages in documents, a.k.a. focused relevance feedback, and integrating it with document relevance feedback. Since relevant documents can contain much non-query-pertaining information, utilizing focused feedback can be of much merit as we show here.


We present a novel study of ad hoc retrieval methods utilizing document-level relevance feedback and/or focused relevance feedback.

Our work is the first to study the utilization of non-relevant passages in relevant documents.


The retrieval methods we present can utilize feedback at different granularities of text (specifically, documents or passages or both.). The first method uses a novel mixture model that integrates relevant and non-relevant information at the language model level. The second method fuses retrieval scores produced by using relevant and non-relevant information separately to rerank a list of documents retrieved by using relevant information only.


In addition, we present a novel method in the relevance model framework that uses information only about the percentage of relevant information in relevant documents.


Empirical exploration performed using two datasets attests to the merits of our methods and sheds light on the effectiveness of using and integrating relevance feedback for textual units of varying granularities.