|M.Sc Thesis||Department of Computer Science|
|Supervisor:||Assoc. Prof. El-Yaniv Ran|
|Full Thesis text|
We propose and study a novel
supervised approach to learning semantic relatedness from examples.
Using an empirical risk minimization approach our algorithm computes a weighted measure of term co-occurrence with respect to a corpus of text documents, and utilizes the labeled examples to fit the model to the training sample.
Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts.
We present the results of a range of experiments from large to small scale.
Evaluation over the WordSim353
benchmark show significant improvements in correlation results over the state-of-the-art
using either a reduced (older) version of Wikipedia or the books in the Project
These results indicate that the proposed method is effective and competitive with the state-of-the-art.