טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentAlon Berger
SubjectIntegrated Bounds for Disintegrated Storage
DepartmentDepartment of Electrical Engineering
Supervisor Full Professor Keidar Idit
Full Thesis textFull thesis text - English Version


Abstract

In recent years we have seen an exponential increase in storage capacity demands, creating a need for big data storage solutions. Distributed storage is a common approach to solve this necessity: data is typically stored on various nodes, which are accessed asynchronously by clients. Such storage systems that support writing values and reading them are commonly referred to as registers. A common requirement of registers is wait-freedom, meaning that, given enough actions, any client performing an operation on the register eventually completes it.


Several algorithms for wait-free registers exist: Byzantine fault-tolerant registers cope with the possibility that some fraction of the nodes might be malicious by replicating data across distinct nodes. Coded storage solutions wish to alleviate the space cost of replication by having the nodes store code words, rather than entire values. Other kinds of registers are designed to store values from a large domain, while each of the nodes is a register whose capacity is limited to storing values from a small domain.


In this research thesis, we point out a somewhat surprising similarity between non-authenticated Byzantine storage, coded storage, and certain emulations of shared registers from smaller ones. A common characteristic in all of these is the inability of reads to safely return a value obtained in a single atomic access to shared storage. We collectively refer to such systems as disintegrated storage, and show integrated space lower bounds for asynchronous regular wait-free emulations in all of them. In a nutshell, if readers are invisible, i.e., their presence is unknown to the writer, then the storage cost of such systems is inherently exponential in the size of written values; otherwise, it is at least linear in the number of readers. Our bounds are asymptotically tight to known algorithms, and thus justify their high costs.