M.Sc Student | Baransi Akram |
---|---|
Subject | Best Empirical Sampled Average - BESA - an Algorithm for the Multi-Armed Bandits |
Department | Department of Electrical Engineering | Supervisor | Professor Shie Mannor |
Full Thesis text | ![]() |
The
stochastic multi-armed bandit problem is a popular model of the
exploration-exploitation trade-off in sequential decision problems. We introduce a simple novel algorithm, called Best
Empirical Sampled Average (BESA), that is based on sub-sampling. Despite its simplicity,
we show that the algorithm demonstrates excellent empirical performances
against the state-of-the-art algorithms,
including Thompson sampling and KL-UCB. The
algorithm is very flexible, it does need to know neither the set of reward
distributions in advance nor the range of
the rewards. It is not restricted to Bernoulli distributions and is also
invariant under rescaling of the rewards. We
provide a detailed experimental study comparing the algorithm to the state of
the art algorithms in a several cases, the
main intuition that explains the striking results, and conclude with a finite-time regret analysis for
this algorithm in the simplified two-arm
bandit setting.
The stochastic multi-armed bandit problem is a
special case of the contextual stochastic multi-armed bandit problem, in which
the contexts set is of size one. We propose
an augmentation for the BESA algorithm to deal with the contextual multi-armed
bandit problem. The augmented algorithm is
similar in its spirit to the BESA algorithm, i.e. based on sub-sampling, but it
sub samples from contexts in the small neighborhood of the current context
and uses a weight function to give a weight for
each sample based on its distance from the current context, we provide an experimental study for the contextual
arms, and explain the intuition behind the proposed augmentation.