M.Sc Thesis | |

M.Sc Student | Gerzon Leonid |
---|---|

Subject | Effective Transductive Learning via Objective Model Selection |

Department | Department of Applied Mathematics |

Supervisor | PROF. Ran El-Yaniv |

This research is concerned with transductive learning. We study a recent transductive learning approach based on clustering. In this approach the learner constructs a diversity of unsupervised models for a given sample of unlabeled data using multiple clustering algorithms with different parameters. The learner then utilizes the labeled data to guess labels for entire clusters (so that all points of the same cluster have the same label). This way a number of hypotheses are constructed, one of which is then selected based on a transductive error bound, which holds with high probability. The algorithm output is a prediction for the labels of the given unlabeled points together with a `performance guarantee', which is an upper bound on the prediction error that holds with high probability.

Empirical examination of this approach, implemented with `spectral clustering', on a suite of benchmark datasets from the UCI repository, indicates that the new approach is effective and comparable with one of the best transductive learning algorithms. A central component of the approach is a prior probability distribution over possible hypotheses, which should be fixed by the learner before any label is supplied. This prior probability distribution can be viewed as a quantification of the learner's prior beliefs on the goodness of various hypotheses. The basic algorithm utilizes a uniform prior distribution, which corresponds to an absence of prior knowledge about the hypotheses. We propose and study methods for constructing other types of prior distributions, which attempt to utilize combinatorial or geometric properties of the resulting clusterings.

The contribution of our work is the empirical evaluation, understanding and improvement of this new transductive learning approach.