טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentOmer Ziv
SubjectAlgebraic Multigrid for Reinforcement Learning
DepartmentDepartment of Electrical Engineering
Supervisor Full Professor Shimkin Nahum


Abstract

TD(λ) is a well known family of algorithms in the area of reinforcement learning. It is commonly used to evaluate a measure of performance, called value function, of a stationary control policy applied in a stochastic dynamic environment, without knowing the environment model. For large state spaces, the value function is approximated to provide tractability and generalization. TD(λ) is inspired by the value iteration method in dynamic programming (DP). Several multiple scale approaches have been proposed for speeding up DP methods. These include state aggregation approaches and the related multigrid approach. This raises the question if a multiple scale approach may be used to speedup corresponding learning algorithms and in particular TD(λ).

Multigrid is a family of multilevel numerical methods for solving large sparse linear equation systems of the form Ax=b. Geometric multigrid relies on a hierarchy of discretizations of a continuous problem on regular meshes. Algebraic multigrid (AMG) operates on progressively smaller matrix equations, constructed fully automatically during a setup phase, and based only on algebraic information contained in A. This makes AMG attractive as a “black-box” solver of numerical problems.

In this research we apply the multigrid approach to speed up DP methods and adapt it to the learning context of TD(λ). This results in two novel multigrid TD algorithms, which exhibit comparable computational complexity to TD(λ) but with faster convergence rates. Our presentation follows the AMG method, which provides an automatic way to construct the grid hierarchies, yet all algorithms are equally applicable to geometric multigrid.