טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentAmit Asaf
SubjectLearning to Cooperete with Application to Bridge
Bidding
DepartmentDepartment of Computer Science
Supervisor Professor Shaul Markovitch


Abstract

This work presents a new model-based framework for decision making and learning in multi-agent systems in partially observable environments. A common way for deciding what action to take is by performing a look-ahead search and choosing the action with the highest expected utility. There are, however, two major problems that need to be solved when applying look-ahead search in such environments: the future actions of the other agents are not known; and the effect of each action can not be predicted due to the partial observability.

The thesis introduces a new general model-based decision-making algorithm that tackles these problems. The models are used to simulate the other agents, thus predicting their action selection. The models are also used to reduce the set of full states that are consistent with the agent actions. This reduced set is then used to perform Monte-Carlo sampling for evaluating the expected utility of each action.

The thesis also presents a learning framework for co-training of cooperative agents that use the above decision-making algorithm. The agents refine their selection strategies during training interaction
and continuously exchange their refined strategies. The refinement is based on inductive learning applied to examples accumulated for classes of states with conflicting actions.

We demonstrate the utility of this framework by applying it to the difficult problem of bridge
bidding. During an auction, each pair of bridge players is required to cooperate in order to compete with the opponent pair. The decision about which action to take is based on partial knowledge about the distribution of the cards.


Applying the co-training framework for this problem indeed demonstrated the effectiveness of this method. The pair of agents that co-trained significantly improved their bidding performance up to a level that is better than the current state-of-the-art bidding algorithm.