|M.Sc Student||Arad Eran|
|Subject||Comparative Research of Algorithms for the Contextual|
|Department||Department of Electrical and Computer Engineering||Supervisor||PROF. Ron Meir|
|Full Thesis text|
Decision making problems are present in our daily life. Whenever we face a situation that requires a decision to be taken, we try to maximize our future benefits, relying on our experience. Such decisions are taken when e.g. we go from one place to another and choose a route that will minimize our expected travel time. At times we choose a well-known route that we have already experienced, but on other occasions we explore a new route that potentially will shorten our travel time. Each decision we make results with a reward that is considered for our next step of decision and there is always a trade-off maintained between taking a new route (exploration) and choosing the known route (exploitation). There are cases were the decision maker is also presented with a side information, context, that he can use to tune his decision.
The Multi-Armed-Bandit (MAB) setting models such decision making problems with exploration-exploitation trade-off. It balances between exploiting decisions that gave the highest reward in the past and exploring new decisions with expected higher reward in the future.
The Contextual Bandit problem is an extension of the MAB where side information or “Expert Advice” (Context) is provided to the agent and the decision is performed based on that “expert advice” with the target to maximize the reward.
In this work, we use different Data Sets such as MNIST and 20NewsGroup for generating the “expert advice” and for simulating several Contextual Bandit algorithms. Comparison is performed with respect to the following parameters: Reward Type, Loss Function, Upper Bound, Results Variation, Learning Curve, Computational Complexity and Efficiency of Exploration.