טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentWinaver Mattan
SubjectClarity-based Query Modification for Information Retrieval
DepartmentDepartment of Industrial Engineering and Management
Supervisors Professor Carmel Domshlak
Professor Oren Kurland
Full Thesis textFull thesis text - English Version


Abstract

Over the past decade or so, the volume of the information available in open-world and enterprise digital libraries has rapidly increased. As a result, the need to search for information in large collections of documents (corpora) has increased as well, and the problem of automated information retrieval has received a growing industrial and research attention.

The primary task of search engines, called ad hoc retrieval, is to provide the user with a sorted list of documents ranked by their purported relevance to a query posed by the user. Since most user queries tend to be short, they do not reveal well the information need of the user and therefore the quality of the results returned by search engines may not be satisfactory.

The pseudo-feedback query expansion is a parametric technique that was developed to improve the retrieval accuracy. It does so by running an initial retrieval using the user's query and then expanding the user's query with terms from the top documents returned by the initial retrieval. However, one problematic issue is the parameter choice made when creating the expanded query. As a result of using a wrong parameter choice, this procedure may augment the user query with terms which are not necessarily related to the user's information need (as some of the documents used to expand the query may be irrelevant) which might cause degradation in retrieval effectiveness.

In this thesis, we propose a language-model based approach for addressing the performance robustness problem with respect to free-parameters' values of pseudo-feedback based query expansion methods. Given a query, we create a set of language models representing different forms of its expansion by varying the parameter values of some expansion method. We then select a single model using criteria originally proposed for evaluating the performance of using the original query, or for deciding whether to employ expansion at all.

Experimental results show that these criteria are highly effective in selecting expanded query models that are not only significantly more effective than poor performing ones, but also yield performance that is almost indistinguishable from that provided by the expanded query models selected by a learning procedure, and, for some query expansion methods, from that provided by the manually optimized query models.