טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentZahavy Tom Ben Zion
SubjectReinforcement Learning and Deep Representations
DepartmentDepartment of Electrical Engineering
Supervisor Professor Shie Mannor
Full Thesis textFull thesis text - English Version


Abstract

Learning to control reinforcement learning (RL) agents directly from raw,

high-dimensional input, like vision or natural language, is a longstanding

challenge in RL. When the state is represented in a high dimension, the

state space is too large, and an optimal solution is exhaustive to compute

(also known as the curses of dimensionality). Therefore, function approximators

like neural networks are used to approximate the policy and/or the

value function, e.g., Deepmind's deep q network (DQN). This unification has

emerged as a new field of research, attracting the machine learning community

to combine results from the RL literature with empirical techniques

from the deep learning field. In the first chapter of this dissertation, I will

present some of my recent works focused on reverse engineering DQN. My

approach is to visualize the representation that is learned by the DQN so

we can interpret the policy that it learned. I then study the solution that

DQN found when using this representation (the weights of its last layer) and

explore methods to improve this solution using linear function approximation

techniques. This neural-linear approach enhances the performance of

DRL agents by using a linear algorithm on top of the last layer activation

of a deep neural network. The other two chapters of my dissertation focus

on RL and DL (respectively) and not necessarily on their combination. The

second part is focused on inverse reinforcement learning, a scenario when an

agent is trying to learn in an environment, without observing the reward,

by mimicking an expert. We study this problem in three cases: under the

average reward criteria, in contextual Markov Decision Processes, and policy

decomposition. Finally, the last chapter of my dissertation shows how

learning good representations leads to state-of-the-art results in real-world

applications, including optics and e-commerce.