טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentEizner Yehoshua
SubjectReinforcement Learning as an Organizing Principle
for Biological Navigation
DepartmentDepartment of Electrical Engineering
Supervisor Professor Ron Meir
Full Thesis textFull thesis text - English Version


Abstract

Place cells and grid cells are components in the rodent’s brain that represent spatial locations. Much research has been devoted in recent years to these cells, including work dealing with the connection between reinforcement learning (RL), spatial learning and spatial navigation.

In this work we present the successor representation (SR) method, which combines some of the advantages of model-free (MF) and model-based (MB) algorithms. In this approach, place cells and grid cells form an integral component of the RL system, so that the spatial representation develops in a way that optimally subserves the RL system. We demonstrate how the SR approach explains multiple experimentally observed phenomena that were difficult to explain coherently in the past.

The SR method forms a gate to a bigger issue, namely the question of MB vs. MF RL. Is there an optimal way to combine these two approaches? Are there different mechanisms in the brain responsible for each method? Is there a controller that chooses between them or combines their capabilities? And if there is, what are its action principles?

By presenting experiments that prove the existence of both mechanisms and the combination between them, we review how these two approaches are expressed in the biological world. We review different approaches for the optimal combination of the MB and MF perspectives, and study their respective merits and disadvantages, aiming to astutely combine them. Finally, we study the opportunities that this combination adds to the fundamental question of exploration vs. exploitation in RL.

This work, in its majority, is a critical review of former researches, while in addition, results of simulations done in the successor representation method are presented.