M.Sc Thesis

M.Sc StudentAzran Arik
SubjectData Dependent Risk Bounds and Algorithms for Hierarchical
Mixture of Experts Classifiers
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Ron Meir


Inductive inference is concerned with the task of constructing an effective model for a set of observations, based on a finite, often noisy, observation sequence. This model is then used to predict the behavior on future data. Binary pattern classification is an example of such a procedure, where one needs to design a mapping, referred to as a classifier, which associates an observation x with a label y = -1 or 1. A classifier is constructed based on a given set of N examples {(xn,yn)} n=1,2,…,N (the data). There are many approaches to designing a classifier, such as neural networks, Boosting, support vector machines etc. These classifiers can be interpreted as global classifiers, where the same classifier carries out the task of
classification over the entire space.

The Hierarchical Mixture of Experts architecture provides a flexible procedure for implementing classification algorithms. The classification is obtained by a recursive soft partition of the feature space in a data-driven fashion. Such a procedure enables local classification where several experts are used, each of which is assigned with the task of classification over some subspace of the feature space. In this work, we provide data-dependent generalization error bounds for this class of models, based on the Rademacher complexity, which lead to effective procedures for performing model selection. Tight bounds are particularly important here, because the model is highly parameterized. The theoretical results are complemented with numerical experiments based on a randomized algorithm, which mitigates the effects of local minima which plague other approaches such as the
expectation-maximization algorithm.