M.Sc Thesis

M.Sc StudentRubinstein Eitan
SubjectSupport Vector Machines via Advanced Optimization Techniques
to Robust Optimization
DepartmentDepartment of Electrical and Computers Engineering
Supervisors DR. Michael Zibulevsky
PROF. Arkadi Nemirovski


The most basic problem considered in Machine Learning is the supervised binary data classification, where one is given a training sample - a random sample (xi , yi), i = 1,…,n of examples - attribute vectors xi equipped with labels yi1 - drawn from an unknown distribution P. The goal is to build, based on this sample, a classifier - a function f(x) taking values of ±1 such that the generalization error Probp{(x,y) : f(x)≠y} is as small as possible. This general problem has numerous applications in classification of documents, texts and images, in computerized medical diagnostic systems, etc.

The SVM approach is the most frequently used technique for solving the outlined problem (same as other classification problems arising in Machine Learning). With this approach the classifier is given by an optimal solution to a specific convex optimization problem. While the theoretical background of SVM is given by a well-developed and deep Statistical Learning Theory, the computational tools used traditionally in the SVM context (that is, numerical techniques for solving the resulting convex programs) are not exactly state-of-the-art of modern Convex Optimization, especially when the underlying data sets are very large (tens of thousands of examples with high-dimensional attribute vectors in the training sample). In the large- scale case, one can not use the most advanced, in terms of the rate of convergence, convex optimization techniques (Interior Point methods), since the cost of iteration in such a method (which grows nonlinearly with the sizes of the training data) becomes too large to be practical. At the same time, from the purely optimization viewpoint, the “computationally cheap” optimization methods routinely used in the large-scale SVM’s seem to be somehow obsolete.

The goal of the thesis is to investigate the potential of recent advances in “computationally cheap” techniques for extremely large-scale convex optimization in the SVM context. Specifically we utilized here the Mirror Prox algorithm (A. Nemirovski, 2004). The research focused on (a) representing the SVM optimization programs in the specific saddle point form required by the algorithm, (b) adjusting the algorithm to the structure of the SVM problems, and (c) software implementation and testing the algorithm on the large-scale SVM data, with the ultimate goal to develop a novel theoretically solid and computationally efficient optimization techniques for SVM-based supervised binary classification.