Ph.D Thesis

Ph.D StudentMannor Shie
SubjectReinforcement Learning and Adaptation in Competitive
Dynamic Environments
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Nahum Shimkin


This thesis is concerned with learning to act optimally in a dynamic, noisy, and competitive environment. We present two solution concepts for such environments. The first concept is based on the empirical measurement of statistics related to the

non-stationary elements in the environment. We take a worst-case estimate over unknown information and re-define the problem as a vector-valued stochastic game. We define a concrete goal, the convex Bayes envelope, and show that it is attainable using approachability theory. This envelope is shown to be safe (above the worst-case performance guarantees) and adaptive (strictly above the worst-case performance guarantees if the non-stationary elements appear non-hostile). The second concept is based on viewing the stochastic game as a repeated super-game with partial monitoring. We show that this approach leads to performance guarantees which are also safe and adaptive.

In the process of developing the first solution concept we generalize current results regarding approachability theory. We also devise learning algorithms for approaching prescribed and unprescribed target sets in vector-valued stochastic games and vector-valued Markov decision processes. These algorithms are demonstrated to be effective for learning in constrained games and Markov decision processes. Two Q-learning style algorithms for scalar average reward zero-sum games are suggested and analyzed.

A different topic of research is the existence of effective linear weak learners. Based on Geometric Discrepancy theory, we suggest conditions under which sufficiently strong linear learners exist, and hence Boosting algorithms are consistent.