|Ph.D Student||Raviv Hadas|
|Department||Department of Industrial Engineering and Management||Supervisor||Professor Oren Kurland|
|Full Thesis text|
The abundance of digital information makes search
essential in our everyday life. One of the most important tasks in the field of
Information Retrieval (IR) is to effectively identify information that pertains
to a user's information need, expressed by a search query.
In this work we study how entities, which are semantically meaningful units associated with rich semantic information, can be utilized for addressing users' information needs. We address two different tasks: entity retrieval and entity-based ad hoc document retrieval.
Entity retrieval is the task of ranking entities in a repository with respect to a user query. In this work we present the first study of the cluster hypothesis (cf., van Rijsbergen, '79) for entities: "closely associated entities tend to be relevant to the same requests". We show that the hypothesis holds to a substantial extent for the task of entity retrieval. In addition, we suggest a novel cluster-based method for entity ranking which is shown to be highly effective. Finally, we explore the query performance prediction task for entity retrieval; that is, estimating retrieval effectiveness without having relevance judgments.
Ad hoc document
retrieval is the classic IR task of ranking documents with respect to a query.
Traditionally, this task is addressed by comparing term-based query and
document representations. We present novel entity-based query and document
representations which are based on language models. The models integrate entity
and term information. We show that using these language models for retrieval
significantly improves retrieval effectiveness with respect to using terms or
entities alone, and with respect to a state-of-the-art term-proximity-based
Finally, we devise novel methods for inducing entity-based query models by utilizing inter-entity similarities. We evaluate the retrieval effectiveness of using these query models and demonstrate their considerable potential for estimating document relevance.