Ph.D Thesis | |

Ph.D Student | Bernstein Andrey |
---|---|

Subject | Approachability in Dynamic Games: Algorithms, Refinements, and Applications to No-Regret Problems |

Department | Department of Electrical and Computer Engineering |

Supervisor | PROF. Nahum Shimkin |

Full Thesis text |

Approachability theory provides
fundamental results on repeated games with vector-valued payoffs. A target set *S*
is *approachable* by a certain player (the agent) if he can ensure that
the average payoff vector converges to that set no matter what his adversary
opponent does. There are two equivalent sets of conditions for a convex set to
be approachable. The first (primary) condition is a geometric separation
condition, while the second (dual) condition requires that the set be *non-excludable*,
namely that for every mixed action of the opponent there exists a mixed action
of the agent (a *response*) such that the resulting payoff vector belongs
to *S*. Existing approachability algorithms rely on the primal condition
and essentially require to compute at each stage a projection direction from a
given point to *S*.

In this thesis, we introduce
approachability algorithms that rely on the *dual* condition. Thus, rather
than projection, the algorithms rely compute the response to a certain action
of the opponent at each stage. Moreover, we consider the case where a set need
not be approachable in general, but may be approached if the opponent played
favorably in some sense. In particular, we consider non-convex sets that
satisfy the dual condition, namely can be approached when the opponent plays a
stationary strategy. While the convex hull of such a set is approachable, this
is not generally the case for the original non-convex set itself. We start by
defining a sense of restricted play of the opponent, and then formulate
appropriate goals for an *opportunistic *approachability algorithm that
can take advantage of such restricted play as it unfolds during the game. We
then propose two classes of approachability strategies that are opportunistic
in that sense.

The utility of the developed algorithms is demonstrated by applying it to certain generalizations of the classical regret minimization problem. In these problems, computation of the required projections is generally complex but a response is readily obtainable. Moreover, these problems lack a convex structure of a standard regret minimization problem. Here the best-response-in-hindsight is not generally attainable, but only a convex relaxation thereof. Our proposed opportunistic algorithms, while ensuring that relaxed goal, also come closer to the non-relaxed one when the opponent's play is restricted in a well-defined sense.

Finally, we present extensions of some of our concepts and algorithms to the stochastic game model, and the corresponding regret minimization problems.