|M.Sc Student||Bercovich Sofia|
|Subject||Anytime Algorithms for Simple Regret|
Minimization in Stochastic Online Planning
|Department||Department of Electrical Engineering||Supervisor||Professor Nahum Shimkin|
|Full Thesis text|
We present a novel anytime algorithm for Simple Regret minimization in the stochastic Multi Armed Bandit problem (MAB), called the Relative Margins (RM) algorithm. Most existing methods for MAB are based on the well known Upper Confidence Bounds (UCB) algorithm, which uses additive confidence bounds. RM uses multiplicative indices instead, which are related to its empirical margin from the current best arm. We use both Bayesian and Frequentist analysis to show the motivation for the algorithm's structure. We prove an upper bound on the Simple Regret of our algorithm, and compare its empirical performance with existing methods.
English Abstract Contrary to existing Simple Regret minimization algorithms, RM does not require prior knowledge about the problem complexity or the computational resources available to the agent, a property which is useful in implementing an RM-based Online Planning agent for random trees and Markov Decision Processes (MDP). Some initial results for the latter problems will be presented as well.