Ph.D Thesis

Ph.D StudentZohar Eyal
SubjectRedundancy Elimination in Networked Systems
DepartmentDepartment of Electrical and Computer Engineering
Supervisors PROFESSOR EMERITUS Israel Cidon
Full Thesis textFull thesis text - English Version


The surge in big files and rich media usage increases the importance of Redundancy Elimination (RE) in network traffic. Our work explores the design, implementation, and deployment of two different and complementary lossless RE technologies - data deduplication and compression. We devise and construct scalable deduplication and compression systems for various environments, and analyze their performance using real data.

Compression is a method for enlarging the utilization of a network or storage resource. Traditionally, compression takes advantage of short bit-string repetition within a single transmission sequence or a file using symbol/word level coding to reduce the number of bits used. Deduplication has similar goals, but it exploits the similarities among different transmissions and files across timeline and the storage system. Deduplication identifies repetition of sizable blocks (chunks) and transmits/stores only a single representative of each chunk. When such a chunk appears again, the deduplication system only transmits/stores a reference to this chunk. The reference field size is typically much smaller than the chunk it represents. The thesis contributions are composed of two main parts: 1) deduplication, presented in Chapters 2 and 3, and 2) compression, presented in Chapters 4 and 5.

In the first part of this thesis, we present two novel network level deduplication systems. First, we present PACK (Predictive ACKs), a novel end-to-end Traffic Redundancy Elimination (TRE) system, designed for cloud computing cost reduction. PACK's main advantage is its capability of offloading the cloud-server TRE effort to end-clients, thus minimizing the processing costs induced by the TRE algorithm. Then, we present Celleration, a novel gateway-to-mobile TRE system, designed for data-intensive cellular networks, that considerably reduces the operator backhaul bandwidth consumption.

In the second part we present a novel elastic compression framework that adapts quickly to changing load conditions in web-servers. The new framework responds to changing conditions within seconds, and also mixes compression levels for fine-grained operation. Finally, we present an optimization framework to compression of various content types. The optimization framework is capable of choosing the best configuration for each given type, in order to minimize the output size given a time budget for job completion.