|M.Sc Thesis||Department of Electrical Engineering|
|Supervisor:||Prof. Meir Ron|
|Full Thesis text|
Reinforcement Learning is a class of approaches frequently used by both biological and artificial agents to solve difficult decision problems. An important algorithmic component of many reinforcement learning solution methods is the estimation of state, or state-action, values of a policy controlling a Markov Decision Process. Arguably, temporal difference methods are among the most successful approaches studied in recent years for addressing these issues. Gaussian process temporal difference (GPTD) is a recently proposed Bayesian approach to policy evaluation, based on a numerically efficient approach to updating a Bayesian posterior distribution for the value function. The present work enriches this Bayesian framework by introducing a hierarchical prior over the space of value function, leading to a multi-model view of learning. Based on Gaussian process techniques, we present the basic theoretical framework for effectively updating the posterior distribution in a computationally effective online fashion. We also present several simulation results demonstrating the promise of this approach.