Ph.D Thesis

Ph.D StudentBernstein Andrey
SubjectApproachability in Dynamic Games: Algorithms, Refinements,
and Applications to No-Regret Problems
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Nahum Shimkin
Full Thesis textFull thesis text - English Version


Approachability theory provides fundamental results on repeated games with vector-valued payoffs. A target set S is approachable by a certain player (the agent) if he can ensure that the average payoff vector converges to that set no matter what his adversary opponent does. There are two equivalent sets of conditions for a convex set to be approachable. The first (primary) condition is a geometric separation condition, while the second (dual) condition requires that the set be non-excludable, namely that for every mixed action of the opponent there exists a mixed action of the agent (a response) such that the resulting payoff vector belongs to S. Existing approachability algorithms rely on the primal condition and essentially require to compute at each stage a projection direction from a given point to S.

In this thesis, we introduce approachability algorithms that rely on the dual condition. Thus, rather than projection, the algorithms rely compute the response to a certain action of the opponent at each stage. Moreover, we consider the case where a set need not be approachable in general, but may be approached if the opponent played favorably in some sense. In particular, we consider non-convex sets that satisfy the dual condition, namely can be approached when the opponent plays a stationary strategy. While the convex hull of such a set is approachable, this is not generally the case for the original non-convex set itself. We start by defining a sense of restricted play of the opponent, and then formulate appropriate goals for an opportunistic approachability algorithm that can take advantage of such restricted play as it unfolds during the game. We then propose two classes of approachability strategies that are opportunistic in that sense.

The utility of the developed algorithms is demonstrated by applying it to certain generalizations of the classical regret minimization problem. In these problems, computation of the required projections is generally complex but a response is readily obtainable. Moreover, these problems lack a convex structure of a standard regret minimization problem. Here the best-response-in-hindsight is not generally attainable, but only a convex relaxation thereof. Our proposed opportunistic algorithms, while ensuring that relaxed goal, also come closer to the non-relaxed one when the opponent's play is restricted in a well-defined sense.

Finally, we present extensions of some of our concepts and algorithms to the stochastic game model, and the corresponding regret minimization problems.