M.Sc Student M.Sc Thesis Kretzu Ben Efficient Algorithms for Bandit Convex Optimization Department of Electrical and Computer Engineering DR. Yehuda Levy DR. Dan Garber

Abstract

Online Learning is a well known approach which mostly concerns scenarios in which data arrives in a sequential manner. It provides both theoretical guarantees as well as practical implications, and has therefore been widely adopted in several research fields, for example, machine learning, statistics and game theory. In this framework, at each time step, a Learner (also called the Decision Maker) provides a prediction based on feedback data available up to that point. The goal is to provide a prediction strategy whose predictions improve over time. Bandit Convex Optimization (BCO) is an instance of online learning in which predictions are restricted to actions from a closed and convex set and the Learner observes only partial feedback. Since online algorithms interact with the environment in an online manner, these algorithms are required to be computationally efficient in order to compute the next prediction and provide a response as fast as possible. A prominent computational bottleneck is the projection step, which is part of the process of solving a quadratic optimization problem under a constraint set. The step is applied at each time step and is part of many current algorithms. This projection step prevents algorithms from scaling up to high-dimensional real-world problems. In this thesis we revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are Projection-free, i.e., based on the conditional gradient method whose only access to the feasible decision set is through a linear optimization oracle (as opposed to other methods which require potentially much more computationally-expensive subprocedures, such as computing Euclidean projections). We present the first aforementioned algorithm that attains O( T 3/4 ) expected regret using only O( T ) overall calls to the linear optimization oracle, in expectation, where T  is the number of prediction rounds. This improves over the O( T 4/5 ) expected regret bound recently obtained by  Karbasi19, and actually matches the current best regret bound for projection-free online learning in the Full-information setting.