M.Sc Thesis

M.Sc StudentRaveh Or
SubjectLearning and Communication in a Multi-Agent Setup
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Ron Meir
Full Thesis textFull thesis text - English Version


In recent years, there has been an increased interest in distributed artificial intelligence, and particularly in multi-agent systems learning. In cooperative multi-agent systems, agents cooperate in order to perform some task, such as information gathering, traffic light control and navigation. While these systems have obvious advantages over single-agent systems due to shared information, an important limitation is the presence of restrictions over the communication channels, ubiquitous in real-world systems. Such restrictions can result from various reasons, such as thermal noise, quantization error, latency, communication failure etc. and appear to be not only an important, but also a necessary consideration, as it has been argued that a multi-agent system with perfect communication is analogous to a single agent system with multiple effectors. However, the effects of these restrictions on the performance of exploration in multi-agent systems learning have not been investigated thoroughly theoretically. In this work, as our learning method for cooperative multi-agent systems we focus on model-free Reinforcement Learning, which has demonstrated impressive recent success in many domains (noticeably game playing and robotic manipulation). We develop a theoretically motivated Reinforcement Learning algorithm for multi-agent systems, in a framework that allows noisy and resource limited sparse communication between agents. By analyzing this algorithm, we provide the first (to the best of our knowledge) Probably Approximate Correct bounds for multi-agent Reinforcement Learning in this setting. Our algorithm optimally combines information from the resource-limited agents, thereby analyzing the tradeoff between noise and communication constraints and quantifying their effects on the overall system performance.