טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentSpiegelman Alexander
SubjectFundamental Bounds and Algorithms in Distributed
Reliable Storage
DepartmentDepartment of Electrical Engineering
Supervisor Professor Idit Keidar
Full Thesis textFull thesis text - English Version


Abstract

In recent years we see an exponential increase in storage capacity demands, creating a

need for big data storage solutions. Additionally, today’s economy emphasizes consolidation, giving rise to massive data centers and clouds. In this era, distributed storage plays a key role. This is marked by two clear trends: First, the storage market gravitates towards distributed storage solutions within data centers as well as across multiple data centers; such systems are typically made up of many cheap, low-reliability storage nodes, and achieve durability and high availability via redundancy. Second, we see more and more users store data in clouds that are accessed remotely over the Internet. In this thesis, we study two fundamental aspects of reliable distributed storage: storage space cost and dynamic storage reconfiguration.

Storage space cost. Given the inherent failure-prone nature of storage and network

components, a reliable distributed storage algorithm must store redundant information

in order to allow data to be reconstructed when some subset of the system fails. The

most common approach to achieve this is via replication, i.e., storing multiple copies

of each data block. The well-known ABD result shows that an f -tolerant storage can

be emulated using a collection of 2 f 1 fault-prone servers, each storing a single readmodify- write object type, which is known to be optimal. We first generalize this bound by investigating the inherent space cost of emulating reliable storage as a function of the object type exposed by the underlying servers. Then, we focus on read-modify-write object type and investigate whether erasure codes can help mitigate the significant cost of replication that results from the immense size of the data. Some previous works attempted to do it, but a closer look at existing solutions reveals that they incur extra storage costs in other places. Specifically, the use of coding creates scenarios where old information cannot be over-written, leading the storage to grow without bound if concurrency is high. We shed light on the fundamental tradeoff between lower redundancy per block (as achieved by codes), concurrency, and allowing old data to be over-written (as achieved by replication).

Dynamic reconfiguration. A key challenge for distributed storage is the problem of

reconfiguration. Clearly, any production storage system that provides data reliability

and availability for long periods must be able to reconfigure in order to remove failed

or old nodes and add healthy or new ones. Reconfiguration is also essential in order

to realize the greatest advantage of distributed storage over traditional monolithic enterprise storage solutions, namely, that it supports elastic incremental growth- the main motivation for the economic model of cloud computing. In this thesis we contribute to the understanding of reconfiguration of distributed storage by showing both negative (impossibility) and positive (algorithms) results.