טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentYanay David
SubjectSupervised Learning of Semantic Relatedness
DepartmentDepartment of Computer Science
Supervisor Professor Ran El-Yaniv
Full Thesis textFull thesis text - English Version


Abstract

We propose and study a novel supervised approach to learning semantic relatedness from examples.
Using an empirical risk minimization approach our algorithm computes a weighted measure of term co-occurrence with respect to a corpus of text documents, and utilizes the labeled examples to fit the model to the training sample.
Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts.

We present the results of a range of experiments from large to small scale.

Evaluation over the WordSim353 benchmark show significant improvements in correlation results over the state-of-the-art using either a reduced (older) version of Wikipedia or the books in the Project Gutenberg collection.
These results indicate that the proposed method is effective and competitive with the state-of-the-art.