M.Sc Thesis
M.Sc Student Kuminov Danny Robust Decision Making in Multi-Agent Systems Department of Industrial Engineering and Management Professor Moshe Tennenholtz

Abstract

In this thesis we tackle the challenging problem of suggesting a useful strategy to a rational agent in two real-life multi-agent settings.

§         Robust Agents

In 2-person zero-sum games, an agent can use a safety-level strategy, a probabilistic algorithm that maximizes the agent's expected utility in the worst case. While this is commonly agreed to be the right solution in the above setting, there is no corresponding suggestion for more complex situations. However, in some interesting settings a safety-level strategy guarantees the value obtained in a Nash equilibrium of the game. Hence, in such settings a rational agent can use that strategy rather than rely on assumptions on rationality of the other decision-makers. Moreover, (Tennenholtz, 2002) introduced the notion of C-competitive strategy, which is a strategy that guarantees a payoff which is not less than a factor of C of what is obtained in a Nash equilibrium. If there exists a C-competitive strategy for small C, then such strategy might be a useful suggestion for the decision-maker. In this work, we show that such strategy exists in the ad auction setting (described in (Varian, 2006)).

§         Reinforcement Learning Agents

In a variety of settings the outcome guaranteed by a safety-level strategy may be reasonable and acceptable. However, obtaining this value becomes more problematic if the actual situation is unknown. The challenge for the agent is, therefore, to act in a way that will facilitate learning on one hand, and guarantee high payoff quickly on the other hand. The R-max model introduced in (Brafman and Tennenholtz, 2002) does just that in the framework of stochastic games, which are an extension of Markov Decision Processes (MDPs) to the context where there are several decision makers. The R-max algorithm shows that one can efficiently guarantee the safety-level value of the underlying (true) interaction, despite the fact that the transition and payoff functions are initially unknown. The basic assumption of this algorithm is that the agent can observe the opponent's actions in addition to his own payoff. In this work, we relax this assumption by showing an algorithm for efficiently obtaining (almost) the safety-level value in the imperfect monitoring setting (in which the agent can observe only his own actions and payoffs) in any generic multi-stage game, which is an interesting general subset of stochastic games.