|M.Sc Student||Liram Matan|
|Subject||Evaluating Zigzag Code in a Distributed Storage System|
|Department||Department of Computer Science||Supervisors||Dr. Gala Yadgar|
|Professor Assaf Schuster|
|Professor Eitan Yaakobi|
|Full Thesis text|
This gap between theory and practice has been observed in previous studies that applied theoretically optimal techniques to real systems. In this paper, we present a novel system-level approach to bridging this gap in the context of reducing recovery costs. We optimize the sequentiality of the data read, at the cost of a minor increase in its amount. We use Zigzag - a family of erasure codes with minimal overhead and optimal recovery - and trade its theoretical optimality for real performance gains. Our implementation of Zigzag and its optimizations in Ceph reduces recovery costs with two, three and four parity nodes, for large and small objects alike. We could cut down recovery time by up to 28% compared to that of Reed-Solomon, and to reduce the amount of data read and transferred by 18% to 39%.