טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentRaviv Ariel
SubjectConcept-Based Approach to Word-Sense Disambiguation
DepartmentDepartment of Computer Science
Supervisor Professor Shaul Markovitch
Full Thesis textFull thesis text - English Version


Abstract

The task of automatically determining the correct sense of a polysemous word has remained a challenge to this day. It is crucial in many natural language processing (NLP) applications such as speech recognition, information retrieval, machine translation and computational advertising. In our research, we introduce Concept-Based Disambiguation (CBD), a novel framework that utilizes recent semantic analysis techniques to represent both the context of the word and its senses in a high-dimensional space of natural concepts. The concepts are retrieved from a vast encyclopedic resource, thus enriching the disambiguation process with large amounts of domain-specific knowledge. In such concept-based spaces, more comprehensive measures can be applied in order to pick the right sense. Additionally, we introduce a novel representation scheme, denoted anchored representation, that builds a more specific text representation associated with an anchoring word. We evaluated our framework using two recent text representation schemes, Explicit Semantic Analysis (ESA) and Compact Hierarchical Explicit Semantic Analysis (CHESA) and their two anchored counterparts, and showed that the anchored representation is more suitable to the task of word sense disambiguation (WSD). Finally, we show that our system is superior to state-of-the-art methods when evaluated on domain-specific corpora, and is competitive to recent methods when evaluated on a general corpus.