|M.Sc Student||Peleg Amit|
|Subject||Meta Learning in Linear Bandits|
|Department||Department of Electrical and Computer Engineering||Supervisor||PROF. Ron Meir|
|Full Thesis text|
Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior, whereas in practice, such information is often lacking. While incomplete knowledge of the prior leads to inferior performance in a full information setting, this problem is exacerbated in setups with partial information, where a misspecified prior may additionally lead to poor exploration.
In this work we prove, in the context of stochastic linear bandits and Gaussian priors, that as long as the estimated prior is sufficiently close to the true unknown prior, the
performance of the applied algorithm is close to that of the algorithm that uses the correct prior. Next, we address the task of learning the prior through metalearning, where a learner updates her estimate of the meta-prior across multiple instances and uses it as a prior for new instances. Then, within each instance she selects actions in order to maximize cumulative rewards, while updating her inner-instance posterior based on incoming observations. We provide an algorithm and regret bounds demonstrating its effectiveness in comparison to an algorithm that knows the correct prior, and support our theoretical results empirically. Our theoretical results hold for a broad class of prior based algorithms, including, for example, Thompson Sampling and Information Directed Sampling.