M.Sc Thesis

M.Sc StudentHarosh Omri
SubjectExploration in Continuous Domains using Pseudo-Counts
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Ron Meir
Full Thesis textFull thesis text - English Version


Exploration is long understood to be an essential part of control under uncertainty and partial knowledge. This can be traced back to the idea of dual-control in optimal control theory, and emerges again in model-free RL. While the problem is considered solved in small finite-state spaces, it remains an open question in large or continuous domains. Some approaches in the finite case rely on state or action visitation counting to promote exploration. Such is the case in the well-known UCB1 algorithm for the Multi-Armed Bandit problem and the UCRL algorithm for finite-state MDPs.

Our work aims to use a notion of counting suitable for the continuous domain. We build upon the framework of Pseudo-Counts and suggest a UCB-like exploration reward bonus to solve several continuous problems. We show empirically that continuous counting works better than direct discretization of the spaces, enjoying lower regret and better estimation of the best arm on the Continuum Armed Bandit problem. We continue to evaluate the approach on a model-free learning problem and a model-based learning problem, showing that the approach can be easily used in different scenarios.

Our work suggests that visitation counts can be useful in both discrete and continuous domains. Furthermore, it suggests that there are better ways to count in the continuous case, that generalize across states better, than direct discretization of the space.