M.Sc Student | Kretzu Ben |
---|---|

Subject | Efficient Algorithms for Bandit Convex Optimization |

Department | Department of Electrical Engineering |

Supervisors | Dr. Yehuda Levy |

Dr. Dan Garber | |

Full Thesis text |

Online Learning
is a well known approach which mostly concerns scenarios in which data arrives
in a sequential manner. It provides both theoretical guarantees as well as
practical implications, and has therefore been widely adopted in several
research fields, for example, machine learning, statistics and game theory. In
this framework, at each time step, a Learner (also called the Decision Maker)
provides a prediction based on feedback data available up to that point. The
goal is to provide a prediction strategy whose predictions improve over time.
Bandit Convex Optimization (BCO) is an instance of online learning in which
predictions are restricted to actions from a closed and convex set and the
Learner observes only partial feedback. Since online algorithms interact with
the environment in an online manner, these algorithms are required to be
computationally efficient in order to compute the next prediction and provide a
response as fast as possible. A prominent computational bottleneck is the
projection step, which is part of the process of solving a quadratic
optimization problem under a constraint set. The step is applied at each time
step and is part of many current algorithms. This projection step prevents
algorithms from scaling up to high-dimensional real-world problems. In this
thesis we revisit the challenge of designing online algorithms for the bandit
convex optimization problem (BCO) which are also scalable to high dimensional
problems. Hence, we consider algorithms that are Projection-free, i.e., based
on the conditional gradient method whose only access to the feasible decision
set is through a linear optimization oracle (as opposed to other methods which
require potentially much more computationally-expensive subprocedures, such as
computing Euclidean projections). We present the first aforementioned algorithm
that attains *O*( *T *^{3/4} ) expected regret using only *O*(*
T *) overall calls to the linear optimization oracle, in expectation, where *T*
is the number of prediction rounds. This improves over the *O*( *T *^{4/5}
) expected regret bound recently obtained by Karbasi19, and actually matches
the current best regret bound for projection-free online learning in the Full-information
setting.