טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentBahar Gal
SubjectDesigning for Efficient Social Learning
DepartmentDepartment of Industrial Engineering and Management
Supervisors Professor Rann Smorodinsky
Professor Moshe Tennenholtz
Full Thesis textFull thesis text - English Version


Abstract

In the on-line Explore & Exploit literature, central to Machine Learning, a mediator is faced with a set of alternatives, each yielding some unknown reward. The mediator's goal is to learn the optimal alternative as soon as possible, via experimentation. A typical assumption in this model is that the mediator has full control over the experiment design and implementation. When experiments are implemented by a society of self-motivated agents the mediator can only recommend experimentation but has no power to enforce it.


There is an inherent conflict between the mediator and the agents. The mediator is far sighted and would like agents to experiment with unfamiliar arms even if they are a-priori inferior. Agents, on the other hand, are myopic and want to pull the arm that maximizes the payoff in the given round. Thus, when a mediator recommends which arm to choose he must account for the fact that his recommendation will not be adopted.  In my thesis I analyze optimal protocols for the mediator that account for the conflict of interest and the incentives of the agents.


 Kremer et al. [2014] are the first to introduce such a model. They study a basic setting in which agents are completely isolated and cannot observe each other. They identify a protocol that is incentive compatible for the agents but is also asymptotically optimal. For the mediator in the sense that the mediator can steer the agents towards the optimal arm.


We extend Kremer et al. [2014] by adding a layer of a social network according to which agents can observe each other. For our first main result I assume agents can only observe each other’s’ action (and not the reward). I provide a characterization of the social network structure, in terms of a bound on how many other agents can each agent observe, for which an incentive-compatible asymptotically optimal protocol exists. I, furthermore, construct such a protocol when it exists.


Next, I consider a model where agents observe not only each other’s actions but also the rewards as well. Observing rewards complicates the challenge and so I study a specific setting where agents arrive sequentially and must pull one of two arms in order to receive a reward - a risky arm and a safe arm. In addition, each agent only observes his immediate predecessor. Once again, I provide the mediator with a recommendation algorithm that is incentive compatible and facilitates social learning.


Third, instead of considering the power of a mediator recommending agents that are connected by a social graph, we give the mediator the power to influence the connectivity of the social graph. In this part the central question we pose is whether there is a natural social graph that prevents the information cascade phenomenon. We introduce the `celebrities graph' and prove that indeed it allows for proper information aggregation in large populations even when the order at which agents decide is random and even when different issues are decided in different orders.