|M.Sc Student||Hilleli Bar|
|Subject||Autonomous Steering without a Simulator using Deep|
Supervised and Reinforcement Learning
|Department||Department of Electrical and Computers Engineering||Supervisor||PROF. Ran El-Yaniv|
|Full Thesis text|
We propose a scheme for training a computerized agent to perform complex human tasks such as highway steering. The scheme is designed to follow a natural learning process whereby a human instructor teaches a computerized trainee. The learning process consists of four elements: (i) supervised imitation learning; (ii) supervised reward induction; (iii) supervised safety module construction; and (iv) reinforcement learning. First, by performing imitation learning, we get a reasonable initial performance of the agent. The goal of this step is to generate an agent capable of operating in the environment without too much risk (e.g., without damaging itself or the environment).
Second, we learn a reward function (to be used later by the RL procedure) from instructor feedback generated while observing the agent operating in the environment using the initial IL policy.
Third, we learn a safety network to be integrated in the RL procedure. Finally, in the RL stage, the reward function and the safety network are used to learn an agent policy without any human supervision.
The scheme is implemented using deep convolutional networks and is applied to successfully create a computerized agent capable of autonomous highway steering over the well-known racing game Assetto Corsa.
We demonstrate that the use of all four components is essential to effectively carry out the steering task using vision alone, without access to a driving simulator internals, and operating in wall-clock time.
This is made possible also through the introduction of a safety network, a novel way for preventing the agent from performing catastrophic mistakes during the reinforcement learning stage.