M.Sc Thesis Department of Electrical Engineering

Gil Litichever


Multi Model Bayesian Reinforcement Learning

Supervisor: Prof. Meir Ron
Full thesis text - English Version   Full Thesis text


Abstract

Reinforcement Learning is a class of approaches frequently used by both biological and artificial agents to solve difficult decision problems. An important algorithmic component of many reinforcement learning solution methods is the estimation of state, or state-action, values of a policy controlling a Markov Decision Process. Arguably, temporal difference methods are among the most successful approaches studied in recent years for addressing these issues. Gaussian process temporal difference (GPTD) is a recently proposed Bayesian approach to policy evaluation, based on a numerically efficient approach to updating a Bayesian posterior distribution for the value function. The present work enriches this Bayesian framework by introducing a hierarchical prior over the space of value function, leading to a multi-model view of learning. Based on Gaussian process techniques, we present the basic theoretical framework for effectively updating the posterior distribution in a computationally effective online fashion. We also present several simulation results demonstrating the promise of this approach.