|M.Sc Student||Ayal Taitler|
|Subject||Intelligent Control and Learning for an Air Hockey Playing|
|Department||Department of Electrical Engineering||Supervisor||Full Professor Shimkin Nahum|
|Full Thesis text|
The game of air-hockey is an interesting domain for research in the fields of artificial intelligence, machine learning and robotics. In this work an intelligent controller for a robot playing air hockey is proposed, with close attention to modular architecture, and hierarchical structure allowing the decoupling of the decisions being made in different temporal scales. The approach taken in of layered design employing ideas from the Reactive\Deliberative paradigm common in AI agents in recent years, allowing the agent to plan ahead its actions, while keeping the flow of information flexible enough to react quickly and efficiently to sudden events and uncertainties.
The controller consists of three layers, each operating in a different time scale.
The upper layer is a long term strategic layer handling game wide opponent modeling and devising general game strategy. The middle layer is a short term tactic layer, in-charge of actual game actions such as types of attacks and defenses. These actions are abstract in the sense that they only define a general behavior and not specific movements in the physical world. The lower layer is a control and execution layer, which takes the decision of the middle layer and makes it concrete in the form of motion profiles and physical considerations, with the associated control to follow these profile. In addition there is a vision and perception layer which tracks all the objects in the game and estimates physical parameters and trajectories, this layer is the eyes of the agent.
Furthermore, a learning scheme based on deep reinforcement learning was proposed for learning the control policies in the lower layer for the different skills. In this work we focus on learning a direct strike. Reinforcement learning (RL) is a computational approach for constructing autonomous agents that improves with experience, it is promising especially in the model-free case where some or all the models are hidden from the learning algorithm. Its applications range from robotics (as in our interest), thorough industrial manufacturing and scheduling, to combinational search problems and computer and board games, as was shown recently with the Atari 2600 games and Go game.
In our problem we consider the task of learning control policies for a robotic mechanism striking a puck in an air hockey game. The control signal is a direct command to the robot's motors. We employ a model free deep reinforcement learning framework to learn the motoric skills of striking the puck accurately in order to score. We propose certain improvements to the standard learning scheme which make the deep Q-learning algorithm feasible when it might otherwise fail. Our improvements include integrating prior knowledge into the learning scheme, and accounting for the changing distribution of samples in the experience replay buffer.
Finally we present our simulation results for aimed striking which demonstrate the successful learning of this task, and the improvement in algorithm stability due to the proposed modifications.