Ph.D Thesis | |

Ph.D Student | Cohen Alon |
---|---|

Subject | Structure and Combinatorics in Online Learning |

Department | Department of Industrial Engineering and Management |

Supervisor | ASSOCIATE PROF. Tamir Hazan |

Full Thesis text |

This manuscript explores theoretical principles that govern the design of algorithms in machine learning, planning and decision-making under uncertainty. We give several new results in which we develop novel learning algorithms and delve into the fundamental limitations of existing ones:

1. We study the limitations of Follow the Perturbed Leader algorithms?a class of simple and computationally efficient online learning algorithms. Our study entails a surprising result?a setting in which FPL algorithms are necessarily suboptimal. This setting is relevant to many problems of practical interest such as the well-known online shortest path problem.

2. We revisit
online learning with feedback graphs?a model that interpolates between
full-information and bandit prediction with expert advice. In the basic
setting, the learner receives an observation model in form of an entire
feedback graph.

In this work, we define a more realistic?albeit, challenging?model: the graph
changes from round to round and the learner gets to observe only the
out-neighborhood of her chosen expert. We prove that in the fully-adversarial
setting, the learner cannot hope to utilize any additional information provided
by the graph. Subsequently, we present a simple algorithm that achieves an
$\widetilde O(\sqrt{\alpha T})$ regret bound when the losses are stochastic
i.i.d., where α is the size of the maximal independent set in the graphs.

3. We study the
setting of bandit (partial-information) structured learning. We solve an
intriguing open problem: what is the minimax-optimal regret in this setting?

We prove a lower bound that matches the best known upper bound in the general
setting as well as in interesting instantiations of the problem such as bandit
shortest path.

4. We explore intrinsic connections between online learning and control theory. We study a variant of the classic Linear-Quadratic Regulator model in which the costs change adversarially from round to round. We provide two simple and efficient learning algorithms with a provable $O(\sqrt{T})$ regret bound.