M.Sc Thesis | |

M.Sc Student | Ziv Omer |
---|---|

Subject | Algebraic Multigrid for Reinforcement Learning |

Department | Department of Electrical and Computer Engineering |

Supervisor | PROF. Nahum Shimkin |

TD(λ) is a well known family of algorithms in the area of reinforcement learning. It is commonly used to evaluate a measure of performance, called value function, of a stationary control policy applied in a stochastic dynamic environment, without knowing the environment model. For large state spaces, the value function is approximated to provide tractability and generalization. TD(λ) is inspired by the value iteration method in dynamic programming (DP). Several multiple scale approaches have been proposed for speeding up DP methods. These include state aggregation approaches and the related multigrid approach. This raises the question if a multiple scale approach may be used to speedup corresponding learning algorithms and in particular TD(λ).

Multigrid is a family of multilevel numerical methods for
solving large sparse linear equation systems of the form *A**x= b*.
Geometric multigrid relies on a hierarchy of discretizations of a continuous
problem on regular meshes. Algebraic multigrid (AMG) operates on progressively
smaller matrix equations, constructed fully automatically during a setup phase,
and based only on algebraic information contained in

In this research we apply the multigrid approach to speed up DP methods and adapt it to the learning context of TD(λ). This results in two novel multigrid TD algorithms, which exhibit comparable computational complexity to TD(λ) but with faster convergence rates. Our presentation follows the AMG method, which provides an automatic way to construct the grid hierarchies, yet all algorithms are equally applicable to geometric multigrid.