M.Sc Thesis | |

M.Sc Student | Azran Arik |
---|---|

Subject | Data Dependent Risk Bounds and Algorithms for Hierarchical Mixture of Experts Classifiers |

Department | Department of Electrical and Computer Engineering |

Supervisor | PROF. Ron Meir |

Inductive inference is concerned
with the task of constructing an effective model for a set of observations,
based on a finite, often noisy, observation sequence. This model is then used
to predict the behavior on future data. Binary pattern classification is an
example of such a procedure, where one needs to design a mapping, referred to
as a classifier, which associates an observation x with a label y = -1 or 1. A classifier is constructed based on a given set of N examples {(x_{n},y_{n})}
n=1,2,…,N (the data). There are many approaches to designing a classifier, such
as neural networks, Boosting, support vector machines etc. These classifiers
can be interpreted as global classifiers, where the same classifier carries out
the task of

classification over the entire space.

The Hierarchical Mixture of Experts architecture provides a flexible procedure
for implementing classification algorithms. The classification is obtained by a
recursive soft partition of the feature space in a data-driven fashion. Such a
procedure enables local classification where several experts are used, each of
which is assigned with the task of classification over some subspace of the
feature space. In this work, we provide data-dependent generalization error
bounds for this class of models, based on the Rademacher complexity, which lead
to effective procedures for performing model selection. Tight bounds are
particularly important here, because the model is highly parameterized. The
theoretical results are complemented with numerical experiments based on a
randomized algorithm, which mitigates the effects of local minima which plague
other approaches such as the

expectation-maximization algorithm.