Ph.D Thesis
Ph.D Student Cohen Alon Structure and Combinatorics in Online Learning Department of Industrial Engineering and Management Dr. Tamir Hazan

Abstract

This manuscript explores theoretical principles that govern the design of algorithms in machine learning, planning and decision-making under uncertainty. We give several new results in which we develop novel learning algorithms and delve into the fundamental limitations of existing ones:

1.      We study the limitations of Follow the Perturbed Leader algorithms?a class of simple and computationally efficient online learning algorithms. Our study entails a surprising result?a setting in which FPL algorithms are necessarily suboptimal. This setting is relevant to many problems of practical interest such as the well-known online shortest path problem.

2.      We revisit online learning with feedback graphs?a model that interpolates between full-information and bandit prediction with expert advice. In the basic setting, the learner receives an observation model in form of an entire feedback graph.
In this work, we define a more realistic?albeit, challenging?model: the graph changes from round to round and the learner gets to observe only the out-neighborhood of her chosen expert. We prove that in the fully-adversarial setting, the learner cannot hope to utilize any additional information provided by the graph. Subsequently, we present a simple algorithm that achieves an $\widetilde O(\sqrt{\alpha T})$ regret bound when the losses are stochastic i.i.d., where α is the size of the maximal independent set in the graphs.

3.      We study the setting of bandit (partial-information) structured learning. We solve an intriguing open problem: what is the minimax-optimal regret in this setting?
We prove a lower bound that matches the best known upper bound in the general setting as well as in interesting instantiations of the problem such as bandit shortest path.

4.      We explore intrinsic connections between online learning and control theory. We study a variant of the classic Linear-Quadratic Regulator model in which the costs change adversarially from round to round. We provide two simple and efficient learning algorithms with a provable $O(\sqrt{T})$ regret bound.