טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentAnava Yael
SubjectA Probabilistic Fusion Framework
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Oren Kurland


Abstract

Our world is saturated with data and the average user may find it difficult to reach the information she needs. Search engines were designed to help Web users to find a needle in the haystack. These engines, in response to the user query, present a ranked list of documents. Ranking is induced using some retrieval algorithm. While the final output to be presented to the user is a single list of documents, several different retrieval algorithms can be applied, and the corresponding retrieved lists can be fused into one list. The motivation is to avoid the pitfalls of any single retrieval method by aggregating the results attained by applying a few.

Formally, the fusion challenge is merging document lists retrieved from the same corpus in response to a query. The fusion problem has been investigated by many researchers, and numerous fusion methods have been devised. Previous work shows that, in general, fusion is an effective technique for improving retrieval performance. Most fusion methods were devised based on the principle of rewarding documents that are highly ranked in many of the retrieved lists.

In this work we present a novel probabilistic framework for the fusion task. The framework sets a formal basis to the principle of rewarding the documents most highly ranked in many of the retrieved list. We show that the framework can be used to derive and explain many existing linear fusion methods. Instantiating the framework using various estimates yields novel fusion methods, some of which significantly outperform the state-of-the-art. In addition, we show that some non-linear fusion methods (e.g., CombMNZ) can also be explained using our framework.