|Ph.D Student||Avner Orly|
|Subject||Machine Learning Inspired by Multi-User Communication|
|Department||Department of Electrical Engineering||Supervisor||Professor Shie Mannor|
|Full Thesis text|
The motivation for the work presented in this thesis comes from the world of cognitive radio networks. Cognitive radio networks are a collection of challenging problems in the world of communications that concern multi-user, multimedia communication networks, and are of interest due to their complex dynamic nature and stochasticity. In my research I apply ideas and approaches from machine learning, in order to model and solve cognitive radio network-oriented problems.
My research focuses on decision theoretic aspects of cognitive radio networks. The first setting consists of a single user acting in an unfamiliar environment, trying to learn the best transmission profile while maintaining an operational constraint (e.g., average power consumption below some threshold). I model the problem using a stochastic multi-armed bandit with a pathwise constraint on the acquired penalty. My solution is an index-based steering policy that maximizes the average reward while steering the average penalty into a set representing the constraint. I provide performance guarantees for both penalty and reward, and demonstrate results through simulations.
The second part of my work examines decoupling exploration and exploitation. The user, whose goal is to maximize the acquired reward, may choose two arms in each round: one for exploration, and the other for exploitation. Thus, the user actually acquires reward from the exploited arm, but observes the reward of the explored arm. The paper offers a thorough theoretical analysis, showing considerable improvement over existing algorithms when there is a small subset of attractive arms or when the identity of the best arm changes over time. A simulation comparing the algorithm to other approaches complements the analysis and demonstrates the results.
In the following section I turn to the multi-user setting. Channels are still represented by the arms of a stochastic multi-armed bandit, but they are targeted by several users, competing over the same resources. To maximize their reward users must learn the characteristics of the different channels while minimizing the number of collisions. Users are not allowed to communicate directly and must manage without any form of cooperation, coordination or central control. I propose an algorithm that combines epsilon-greedy learning with a collision avoidance mechanism. I analyze its regret with respect to the system-wide optimum and show that sub-linear regret can be obtained. My main innovation is the fact that the users' policy does not require knowledge of the number of users in the network. As a result, my algorithm can also handle scenarios with a variable number of users. My last project extends the multi-user setting to the challenging scenario in which the characteristics of each channel are different for each user. As before, information exchange between the users must be limited to a minimum. Achieving the optimum in terms of regret is not possible without explicit communication, and instead I focus on reaching a stable marriage configuration. I develop an algorithm for learning a stable configuration, including an extension for a variable number of users. I further offer both convergence guarantees and experiments, including comparison to state-of-the-art algorithms.