M.Sc Thesis Department of Computer Science

David Yanay


Supervised Learning of Semantic Relatedness

Supervisor: Assoc. Prof. El-Yaniv Ran
Full thesis text - English Version   Full Thesis text


Abstract

We propose and study a novel supervised approach to learning semantic relatedness from examples.
Using an empirical risk minimization approach our algorithm computes a weighted measure of term co-occurrence with respect to a corpus of text documents, and utilizes the labeled examples to fit the model to the training sample.
Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts.

We present the results of a range of experiments from large to small scale.

Evaluation over the WordSim353 benchmark show significant improvements in correlation results over the state-of-the-art using either a reduced (older) version of Wikipedia or the books in the Project Gutenberg collection.
These results indicate that the proposed method is effective and competitive with the state-of-the-art.