טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentTuisov Alexander
SubjectSearch Techniques for Online Planning with Simulated
Action Models
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Carmel Domshlak
Full Thesis textFull thesis text - English Version


Abstract

Arcade Learning Envirnoment (ALE) has been introduced in (Bellemare et al.,

2013) as a platform for developing search algorithms for description-less environments

with simulators. Similarly to the classical AI planning, the Atari 2600

games supported in ALE all feature a fully observable (RAM) state and actions

that have deterministic effect. At the same time, the problems in ALE are given

only implicitly, via a simulator, a priori precluding exploiting most of the modern

classical planning techniques. Despite that, (Lipovetzky et al., 2015) showed

how online planning for Atari-like problems can be effectively addressed using

IW(i), a blind state-space search algorithm that employs a certain form of structural

similarity-based pruning.

We show that the effectiveness of the blind state-space search for Atari-like

online planning can be pushed much further by focusing the search using both

structural state similarity and the relative myopic value of the states. We also

show that these methods can be imported back to the classical planning, where

they can be utilized as a heuristic enhancers, and achieve significant improvements

even when paired up with state-of-the-art heuristics. Furthermore, we show that

a different family of algorithms, one that plans for a bounded-length trajectories

and not only for the next action to apply, allows solving problems that so far were

out of our reach. In particular, we present a family of such Monte-Carlo Tree

Search algorithms that favorably compete with its state-of-the-art counterparts.

Finally, noticing that the two algorithmic approaches are rather complementary,

we examine both a pre-sampling based selection among the two, as well as an

alternating composition of the algorithms, and show that they favorably compete

with both of their individual components.