טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentShraer Alexander
SubjectReliable Collaboration Using Unreliable Storage
DepartmentDepartment of Electrical Engineering
Supervisor Professor Idit Keidar
Full Thesis textFull thesis text - English Version


Abstract

This thesis concerns the reliability, security, and consistency of storage in distributed systems. Distributed storage architectures provide a cheap and scalable alternative to expensive monolithic disk array systems currently used in enterprise environments. Such distributed storage architectures make use of many unreliable servers (or storage devices) and provide reliability through replication. Another emerging alternative is cloud storage, offered remotely by multiple providers.

The first problem addressed in this thesis is the support of reconfiguration in distributed storage systems. The large number of fault-prone servers in such systems requires supporting dynamic changes when faulty servers are removed from the system and new ones are introduced. In order to maintain reliability when such changes occur, it is essential to ensure proper coordination. Existing solutions are either centralized, or use strong synchronization algorithms (such as consensus) among the servers to agree on every change in the system. In fact, it was widely believed that reconfiguration requires consensus and therefore cannot be achieved in asynchronous systems. In this work we refute this belief and present DynaStore, an asynchronous and completely decentralized reconfiguration algorithm for read/write storage.

Cloud storage is another setting where reliability is a challenge. Clients must currently trust cloud providers to handle their information correctly, and do not have tools to verify this. Previously proposed solutions that aim to protect clients from faulty cloud storage providers sacrifice liveness of client operations in the normal case, when the storage is working properly. For example, if a client crashes in the middle of making an update to a remote object, no other client can ever read the same object. We prove that this problem is inherent in all theoretical semantics previously defined for this model. We define new semantics that can be guaranteed to clients even when the storage is faulty without sacrificing liveness, and present FAUST, an algorithm providing these guarantees. We then present Venus, a practical system based on a variation of FAUST. Venus guarantees data consistency and integrity to clients that collaborate using commodity cloud storage, and alerts clients when the storage is faulty or malicious (e.g., as a result of a software bug, misconfiguration, or a hacker attack). Venus does not require trusted components or changes to the storage provider. Venus offers simple semantics, which further enhances its usability. We evaluate Venus with Amazon S3, and show that it is scalable and adds no noticeable overhead to storage operations.