טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentBretter Ronny
SubjectMulti Learning with Distortion Rate Theory
DepartmentDepartment of Electrical Engineering
Supervisors Professor Yacov Crammer
Professor Lihi Zelnik-Manor
Full Thesis textFull thesis text - English Version


Abstract

In this thesis the general problem of learning in parallel multiple tasks is examined in a novel approach based on information theory.

The basic and intuitive motivation of algorithms in the area of multi task learning is that closely related tasks share mutual information that can be exploited in order to improve performance of each individual task. Previous work in the multi task learning area succeeded to improve performance based on this concept, yet in this work a new principled approach for multi task learning is proposed. The main contribution of this work is by providing a systematic approach based on information theory that enables to derive new algorithms. The power of the formalism leads to the development of elegant algorithms that enable to weight related tasks according to their significance to a pre-defined task. The developed formalism has the advantage of being simple, general and applicable to both discriminative and generative learning problems such as maximum likelihood, regression and classification. The main theoretical idea of the thesis is by formulating the fundamental information fusion between learned tasks as a compression problem based on the distortion-rate theory. In this analogy the fundamental tradeoff between compression rate and data distortion tunes the bias - variance tradeoff in the multi learning problem. I.e. the tradeoff between preventing over fitting to the training data and minimizing the average loss. The thesis focuses on an algorithmic application for multi SVM learning and evaluates empirically the performance on the Sentiment classification and 20 news group data bases. Results show that the proposed approach yields an improved performance in a large amount of experiments compared to few base lines.