M.Sc Thesis

M.Sc StudentGandelman Michael
SubjectTreepication: An Erasure Code for Decentralized
Storage Systems
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROF. Yuval Cassuto
Full Thesis textFull thesis text - English Version


Distributed storage systems are built and configured in order to guarantee long-term reliable data storage by employing non-reliable storage nodes. We may frame reliability in such systems in terms of survivability, availability and accessibility of the data. This is often achieved in commercial distributed systems via replication of stored data.

Even when the replication of stored data has many benefits, its implementation is very costly (specially in terms of storage). Erasure coding is an alternative form of redundancy which can provide performance features at a much lower cost. Nevertheless, data access in an implementation of erasure coding involves high consumption of system resources.

This work introduces a new coding scheme called Treeplication. The particular properties derived from the hierarchical nature of the code allow it to feature the benefits of both coding and replication. Treeplication’s tree structure facilitates maximizing the recoverability of random subsets of code fragments, while at the same time behaving similarly to replication in distributed recovery of individual data fragments. The significant performance advantages over both replication and existing erasure codes motivate the use of Treeplication in decentralized distributed storage systems.

This thesis first presents Treeplication coding in terms of its tree structure and an analysis of its advantages over replication in terms of data recoverability, which includes proposed optimal symbol selection algorithms. The second part of the thesisfocuses on the scheme’s superior access performance over MDS codes, while also providing an algorithm for optimal data access. Finally, we study Treeplication in the context of its forward-looking properties in systems that change over time. We define a measure that quantifies the data unit’s resilience to likely adverse changes in the system, and propose a mechanism for incorporating additional storage nodes to the data unit in a way that effectively increases its resilience and that of the system overall.