|M.Sc Student||Chiswick Max|
|Subject||Solving Computer Poker with Counterfactual Regret|
|Department||Department of Electrical Engineering||Supervisor||Professor Nahum Shimkin|
|Full Thesis text|
In this paper, we examine the evolution of solving poker games for Nash Equilibrium solutions, from early game theory research to recent advances, which have primarily used the Counterfactual Regret Minimization (CFR) algorithm. CFR is a self-play learning algorithm that begins with a uniform random strategy, and over time converges to a Nash equilibrium strategy by playing strategies in proportion to their regrets at each node of the game tree.
Poker games are a subset of imperfect information games, which means that not all information is known to both players as it is in games like chess, since in poker each player has his own private cards that are not seen by other players. The imperfect setting is more applicable to real life decision making, such as where to send officers in a security situation, but presents difficulties because of the unknown information, which means that perfect information techniques like backwards induction and Monte Carlo Tree Search cannot work in the same way.
The Scientific Review examines solving a simple poker game using normal form linear programming, sequence form linear programming, and then using CFR, and thereby tracking the advances in solution techniques over multiple decades. We then look in depth at the components of CFR, including learning and regret, and how the CFR algorithm works.
The Literature Review examines research leading up to the CFR algorithm and various extensions to the CFR algorithm including advances in measuring game size and exploitability, and abstractions that are used in the Abstraction-Solving-Translation approach to solving games, which involves reducing the size of a game to make it solvable, solving it, and then creating a translation so that the abstracted strategy can be played in the full original game. Finally, we look at recent significant breakthroughs including completely solving the game of Heads Up Limit Texas Hold’em in 2015 and an agent defeating four of the best players in the world at Heads Up No Limit Hold’em in 2017.
We perform two experiments related to CFR. The first compares four different CFR algorithms in a basic 1-card poker game called Kuhn Poker and finds that the sampling algorithms converge faster. These algorithms require more iterations, but each iteration is much faster. The second experiment examines different betting abstractions in a simplified No Limit Texas Hold’em game that uses a 20 card deck instead of a 52 card deck. By using a game small enough to solve for the Nash equilibrium strategy in the full game, we can evaluate different abstractions against that main game strategy. We also evaluated each abstraction against each other abstraction and we unexpectedly find that the larger abstractions that allow for more bet size options perform worse than a simple abstraction, perhaps because the more complex ones are overfit to their abstracted game.