Ph.D Thesis

Ph.D StudentKol Tomer
SubjectUsing Storage Space and Stored Data to Mitigate Storage
and Communication Performance Bottlenecks
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROFESSOR Yitzhak Birk


Storage-related communication (web browsing, transmission of stored video, remote backup, etc.) constitutes a major fraction of Internet traffic. While communication bandwidth and the storage capacity of magnetic disk drives have been growing very rapidly, disk access time has hardly changed. In this work, we show how to spend the abundant resources, namely storage space and even stored data, in order to reduce disk-access and/or communication requirements, thereby increasing achievable system performance. This is demonstrated in three diverse situations:

Disk-based web servers with ``flat'' access distributions, e.g., hosting infrequently-accessed web pages. Here, caching is ineffective. Prefetching does not help either, as it does not reduce the amount of work done per request. Instead, we propose the packaging of composite objects, so that a single disk access fetches the entire page. The potential benefit, feasibility and complications are explored.

Peer-to-Peer storage and backup systems. The abundance of underutilized storage capacity in networked computers and the cost of centralized storage/backup solutions have resulted in a growing interest in peer-to-peer architectures. This work complements prior art by focusing on the data itself. We consider placement, ECC parameter selection, as well as data transfer and scheduling when recovering multiple objects. An important distinction is made between actual failures and transient unavailability of storage nodes, and the true requirements are carefully identified.

Transmission of distinct supplemental data to caching clients over a broadcast channel.  A broadcast channel (e.g., wireless networks) is very efficient at sending the same data to everyone, but is very inefficient when used for unicast. We show how knowledge by the sender of the requirement of each recipient as well as its cache content can be used to construct erasure-correcting codes that substantially reduce the amount of supplemental data that must be transmitted in order to enable every client to derive its required data. Each client's derivation uses the transmitted data as well as that client's previously cached data. We prove that minrank, a measure defined on the characteristic matrix describing the cache contents of the clients, fully characterizes the number of blocks that must be transmitted when linear codes are used. For general codes, minrank is proved to be a tight lower bound on the number of blocks that must be transmitted for a broad class of graphs corresponding to the characterizing matrix.

The observations made and insights offered by this work can assist a system designer in improving overall system cost/performance.