M.Sc Thesis

M.Sc StudentZilber Erez
SubjectTPT-RAID: a High Performance Box-Fault Tolerant Storage
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROFESSOR Yitzhak Birk


Storage devices are inexpensive. Reliable storage systems, however, remain expensive, and even they are susceptible to a box-level failure, rendering an entire ECC group unavailable. One solution is a multi-box raid, wherein each error-correction group uses at most one block from each storage box. We introduce TPT-RAID, a highly available, scalable yet simple multi-box RAID. TPT-RAID extends the idea of an out-of-band SAN controller into the RAID: data is sent directly between hosts and targets and between targets, and the RAID controller supervises ECC calculation performed by the targets. This prevents a communication bottleneck in the controller and improves performance dramatically while retaining the simplicity of centralized control. TPT-RAID moreover conforms to a conventional switched network architecture, whereas an in-band RAID controller would either constitute a communication bottleneck or would have to be constructed as a full-fledged router. TPT-RAID can be implemented as a software extension to a SAN controller without hardware changes. This and TPT-RAID 's scalability are demonstrated by our TPT-RAID prototype that uses InfiniBand, an emerging very high speed interconnect with RDMA capability. We prove the correctness and completeness of TPT-RAID. Finally, we describe the required protocol extensions.