טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentRaviv Hadas
SubjectEntity-Based Retrieval
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Oren Kurland
Full Thesis textFull thesis text - English Version


Abstract

The abundance of digital information makes search essential in our everyday life. One of the most important tasks in the field of Information Retrieval (IR) is to effectively identify information that pertains to a user's information need, expressed by a search query.
In this work we study how entities, which are semantically meaningful units associated with rich semantic information, can be utilized for addressing users' information needs. We address two different tasks: entity retrieval and entity-based ad hoc document retrieval. 

Entity retrieval is the task of ranking entities in a repository with respect to a user query. In this work we present the first study of the cluster hypothesis (cf., van Rijsbergen, '79) for entities: "closely associated entities tend to be relevant to the same requests". We show that the hypothesis holds to a substantial extent for the task of entity retrieval. In addition, we suggest a novel cluster-based method for entity ranking which is shown to be highly effective. Finally, we explore the query performance prediction task for entity retrieval; that is, estimating retrieval effectiveness without having relevance judgments.

Ad hoc document retrieval is the classic IR task of ranking documents with respect to a query. Traditionally, this task is addressed by comparing term-based query and document representations. We present novel entity-based query and document representations which are based on language models. The models integrate entity and term information. We show that using these language models for retrieval significantly improves retrieval effectiveness with respect to using terms or entities alone, and with respect to a state-of-the-art term-proximity-based retrieval method. 
Finally, we devise novel methods for inducing entity-based query models by utilizing inter-entity similarities. We evaluate the retrieval effectiveness of using these query models and demonstrate their considerable potential for estimating document relevance.