Ph.D Thesis

Ph.D StudentBen Bassat Ran
SubjectBig Data Methods for Efficient Network Monitoring
DepartmentDepartment of Computer Science
Supervisor PROF. Roy Friedman
Full Thesis textFull thesis text - English Version


Streaming algorithms are used nowadays for a large variety of applications such as network monitoring, identifying social networks influencers, and databases.

These applications share the need to handle a massive data inflow, which requires algorithms to process new data exceptionally efficiently. Additionally, the data stream is often too large to be stored in memory, thus algorithms can only save a small representation of the stream, known as a sketch.

This thesis improves the speed and space requirements for streaming problems that are fundamental to networking.  

Our first contribution is efficient algorithms for summing over a sliding window on an integer stream, with an additive error. Our algorithms are succinct - space optimal up to a 1(1) multiplicative factor from our lower bound.

Our second contribution is an efficient algorithm for identifying the most frequent elements in a data stream. Existing algorithms exhibit poor performance when applied to networking data. We introduce RAP - a novel solution for the frequency and top-k estimation problems. We demonstrate space reductions compared to the alternatives by a factor of up to 32 on real packet traces and up to 128 on heavy-tailed workloads.

For top-k identification, RAP exhibits savings by a factor of between 4 and 64 depending on the workloads' skewness. Additionally, we present d-Way RAP, a hardware-friendly variant that empirically maintains its space and accuracy benefits.

The increasing popularity of jumbo frames means growing variance in the size of packets transmitted in modern networks. Our third contribution is developing constant time algorithms for volume estimations in streams and sliding windows, which are faster than previous work. Our solutions are formally analyzed and are extensively evaluated over multiple real-world packet traces as well as synthetic ones. For streams we demonstrate a run-time improvement of up to 2.4X compared to state of the art.

Monitoring tasks, such as anomaly and DDoS detection, require identifying frequent flow aggregates based on common IP prefixes. These are known as Hierarchical Heavy Hitters (HHH). The complexity of existing HHH algorithms is proportional to the hierarchy size, imposing significant overheads. The next contribution is a randomized constant time algorithm for HHH. Using real Internet packet traces, we demonstrate comparable accuracy and recall as previous works, while running up to 62 times faster.

Our last contribution revisits the sliding window regime and considers additional measurement tasks. Various capabilities, such as DDoS detection and load balancing, require insights about multiple metrics including Bloom filters, per-flow counting, count distinct and entropy estimation. Our work presents a unified construction that efficiently solves all the above problems in the sliding window model.