טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentNahir Amir
SubjectDesign and Management of Complex Distributed Systems:
Optimization and Game-Theoretic Perspectives
DepartmentDepartment of Computer Science
Supervisors Professor Dan Raz
Professor Ariel Orda
Full Thesis textFull thesis text - English Version


Abstract

The design and management of current distributed systems is a very complex task.  This is mainly due to the fact that typical systems are very large and are often not controlled by a single entity. For example, the Internet is composed of independent administrative entities, called Autonomous Systems (ASs), and the overall behavior is determined by a non-trivial combination of the different policies of each AS and the actions of the end-users. When designing such a system, one must consider the fact that, while the system's designer may have some idea of optimal system-wide behavior, it has to consider very different possible policies and end-user actions, which determine the actual system performance.


Cloud computing is an emerging computing paradigm in which tasks are assigned to a combination of connections, software and services accessed over a network. This network of servers and devices is collectively known as

``the cloud''. Computing at the scale of the cloud allows users to access supercomputer-level power using a thin client or another access point, like a smartphone or a laptop. Since end-users are given access to supercomputer-level resources, their effect over the system's overall performance is greater than ever. This raises multiple research questions related to the management and performance of cloud computing systems in light of the end-user's selfishness.


In this work we specifically study the topologies of networks constructed by selfish users and the overall system performance when selfish end-users may split work between a shared resource (cloud) and private resources. We also consider task assignment policies that are specifically adequate for large-scale distributed systems, and show that they provide new capabilities in improving system performance.  In particular, we develop new resource allocation algorithms that converge to a working point that balances the end-user experience with the operational costs of leasing resources from the cloud provider.