|M.Sc Student||Yaacoby Eyal|
|Subject||Efficient Exploration for the Bandit Multiclass Problem|
|Department||Department of Electrical Engineering||Supervisor||Professor Ron Meir|
|Full Thesis text|
The Bandit Multiclass Problem is an online decision making setting, where an agent is presented with samples (e.g. images of objects) and predicts labels (e.g. objects identity) out of a finite set of possible labels. Following each prediction the agent only receives binary feedback - whether it was right or wrong, and its goal is to maximize the cumulative number of correct predictions. Utilizing recent advances in Bayesian inference in multi-layered Neural Networks (NNs), that provide confidence levels for the weight parameters, we can have models that combine the strengths of multi-layered NNs with credible confidence levels per sample. Specifically, we suggest a novel framework, that uses a Probabilistic BackPropagation (PBP) neural network, which approximates the posterior distribution of each of its weights by a Gaussian, and gives credible confidence levels per label.
Within our framework, we evaluate different exploration methods for the Bandit Multiclass Problem, and demonstrate ingredients that facilitate efficient exploration in complex domains. Specifically, we find that a UCB-like bonus, which gives priority for labels rarely observed in the past, is important for efficient exploration. Finally, we suggest an exploration method that applies the theory of ‘curiosity’-driven exploration to the Bandit Multiclass Problem. When following this method, the agent selects labels that maximize its Information Gain, defined as the reduction in entropy of the posterior distribution. We show that this exploration method performs well for different neural network configurations, and for different datasets. While our theoretical understanding of multi-layered NNs is lacking, our work suggests that ‘curiosity’-driven exploration strategies are universal in nature, and work well in many practical settings.