Ph.D Thesis


Ph.D StudentVoldman Sergey
SubjectNonconvex finite Sum Optimization: Algorithms and
Theory
DepartmentDepartment of Industrial Engineering and Management
Supervisors ASSOCIATE PROF. Shoham Sabach
ASSOCIATE PROF. Tamir Hazan
Full Thesis textFull thesis text - English Version


Abstract

Nonconvex optimization problems are in the frontline of today's research in applied mathematics.

In this thesis, we will focus on the finite sum model, which naturally appears in many applications of machine learning, data analysis, image processing, to mention just a few. Motivated by challenging large-scale applications, such as the clustering problem and training of neural networks, we study three types of finite sum models. In each case, we propose a tailored iterative method, which is designed to exploit the structure of the model and come with theoretical guarantees.


Motivated by the Clustering Problem, the first model studies the problem of minimizing an objective function defined as the finite sum of a minimum collection of nonsmooth and convex functions. To tackle this nonsmooth and nonconvex problem, we develop a Smoothing Alternating Minimization-Based Algorithm (SAMBA), which is proven to globally converge to a critical point of the smoothed problem. We then show how it can be applied to the Clustering Problem with adequate smoothing functions, producing two very simple algorithms resembling the so-called k-means algorithm, with global convergence analysis. In addition, we compare the clusterings attained by the two smoothing approaches, over several clustering measures.


The second model deals with a finite sum of a maximum of gradient Lipschitz functions which are not necessarily convex, thus it is a nonconvex and nonsmooth minimization problem. This model is of great interest to the machine learning community, since the objective function in specific deep learning applications fits this model. We develop a Stochastic Proximal Linear Method (SPLM) that is guaranteed to reach a critical point of this learning objective and analyze its convergence rate. We show the effectiveness of the proposed method over other well-known machine learning stochastic algorithms, which lack any theoretical guarantees.


The third part of this thesis focuses on the Split Feasibility Problem, which can be modeled as a sum of two functions. The purpose of this part is to study this problem from an optimization point-of-view and propose a catalog of iterative methods for solving the problem in the nonconvex setting. We consider four different optimization formulations of the problem, where each model has advantageous in different settings of the problem. For each model, we study relevant iterative algorithms, some of which are well-known in this area and some are new. All the studied methods, including the well-known CQ Algorithm, are proven to have global convergence guarantees in the nonconvex setting under mild conditions on the problem's data.