טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentAsi Hilal
SubjectConstructions of PIR and Batch Codes for Distributed
Storage
DepartmentDepartment of Computer Science
Supervisor Professor Eitan Yaakobi


Abstract

Distributed and cloud storage systems today are required to tolerate the failure or unavailability of some of the nodes in the system. The simplest and most commonly method to accomplish this task is by replication, whereby every node is replicated several times, usually three. This solution has clear advantages due to its simplicity, fast recovery, and efficient availability. However, it entails a large storage overhead which becomes costly in large storage systems. Availability in particular plays an important role in supporting high throughput of the system. Consider the case in which a large number of files are stored in a distributed storage system and user requests of the files are received. If each file has one copy or can be recovered only once, and several users request this file, then these requests

will have to be served sequentially, which will significantly slow down the system’s response time.

In this work we study two families of codes with availability for distributed storage. The first family of codes, called private information retrieval (PIR) codes, requires that every information symbol has some k mutually disjoint recovering sets. The second family of codes, which is a generalization of the first one, was first proposed in the last decade by Ishai et al. under the framework of batch codes. These codes were originally motivated by different applications such as load-balancing in storage and cryptographic protocols. Under this setting, it is required that every multiset request of k symbols can be recovered by k mutually disjoint recovering sets. Note that every batch code is in particular a PIR code with the same parameters.