טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentBaransi Akram
SubjectBest Empirical Sampled Average - BESA - an
Algorithm for the Multi-Armed Bandits
DepartmentDepartment of Electrical Engineering
Supervisor Professor Mannor Shie
Full Thesis textFull thesis text - English Version


Abstract

The stochastic multi-armed bandit problem is a popular model of the exploration-exploitation trade-off in sequential decision problems. We introduce a simple novel algorithm, called Best Empirical Sampled Average (BESA), that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against the state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know neither the set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art algorithms in a several cases, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.

The stochastic multi-armed bandit problem is a special case of the contextual stochastic multi-armed bandit problem, in which the contexts set is of size one. We propose an augmentation for the BESA algorithm to deal with the contextual multi-armed bandit problem. The augmented algorithm is similar in its spirit to the BESA algorithm, i.e. based on sub-sampling, but it sub samples from contexts in the small neighborhood of the current context and uses a weight function to give a weight for each sample based on its distance from the current context, we provide an experimental study for the contextual arms, and explain the intuition behind the proposed augmentation.