M.Sc Thesis

M.Sc StudentVatashsky Ben-Zion
SubjectMixed Norm Regularization for Learning with Many Labels
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROF. Yacov Crammer


The wide and common use of regularization began in the 1960's where some works proposed ridge regression (a work by Tikhonov was published in 1943). Since then, vector norm regularizations became a very common method to handle ill-posed problems, prevent overfitting and incorporate prior knowledge. Various problems were addressed using regularization both in signal processing and machine learning.

For many years l2 norm regularization was popular both in classification (e.g. SVM) and regression (e.g. ridge regression). In recent years l1 regularization has been widely used for obtaining sparse representation. When variables may be grouped by double index, mixed norms may be used for regularization. A commonly used mixed norm is l2,1, which promotes group sparsity, and is often used for variables (features) selection during the learning process.

We examine a different form of mixed norm regularization for multi class and multi label classification, a type that will induce individual variable selection. Our assumption is that in some cases specific features are informative to a small number of classes. Another view we propose is that different classes may be described well by different subsets of features. These views imply specific sparsity patterns. We show that imposing these sparsity patterns can be achieved by different summation order of l1,2 mixed norm regularization.

The resulting optimization problem is a sum of a regularization term and a loss term. We have used various loss functions for both multi class and multi label settings. The minimization was addressed by the forward backward algorithm which is a common proximal method. 

Some theoretical properties for mixed norm regularizations are presented. An equivalence of regularization and robust optimization, previously shown for binary classification, is extended to the multi class and multi label setting. Generalization bounds are also presented.

Our empirical study of the proposed mixed norm regularizations includes comparison of these regularizations to the common l2,2, l1,1 and l2,1 regularization functions. We performed experiments for performance comparison and for sparsity patterns analysis ran on numerous multi class and multi label datasets. Various measures are used to evaluate the tested regularizations. We show that in some cases the suggested regularizations have performance advantage compared to the common sparse regularizations (l1,1 and l2,1) , including surprising superior sparsity rates, higher even than l1,1, which was expected to obtain the sparsest results.