M.Sc Thesis

M.Sc StudentKretzu Ben
SubjectEfficient Algorithms for Bandit Convex Optimization
DepartmentDepartment of Electrical and Computer Engineering
Supervisors DR. Yehuda Levy
DR. Dan Garber
Full Thesis textFull thesis text - English Version


Online Learning is a well known approach which mostly concerns scenarios in which data arrives in a sequential manner. It provides both theoretical guarantees as well as practical implications, and has therefore been widely adopted in several research fields, for example, machine learning, statistics and game theory. In this framework, at each time step, a Learner (also called the Decision Maker) provides a prediction based on feedback data available up to that point. The goal is to provide a prediction strategy whose predictions improve over time. Bandit Convex Optimization (BCO) is an instance of online learning in which predictions are restricted to actions from a closed and convex set and the Learner observes only partial feedback. Since online algorithms interact with the environment in an online manner, these algorithms are required to be computationally efficient in order to compute the next prediction and provide a response as fast as possible. A prominent computational bottleneck is the projection step, which is part of the process of solving a quadratic optimization problem under a constraint set. The step is applied at each time step and is part of many current algorithms. This projection step prevents algorithms from scaling up to high-dimensional real-world problems. In this thesis we revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are Projection-free, i.e., based on the conditional gradient method whose only access to the feasible decision set is through a linear optimization oracle (as opposed to other methods which require potentially much more computationally-expensive subprocedures, such as computing Euclidean projections). We present the first aforementioned algorithm that attains O( T 3/4 ) expected regret using only O( T ) overall calls to the linear optimization oracle, in expectation, where T  is the number of prediction rounds. This improves over the O( T 4/5 ) expected regret bound recently obtained by  Karbasi19, and actually matches the current best regret bound for projection-free online learning in the Full-information setting.