טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentKassner Yaron
SubjectFrugal Counting
DepartmentDepartment of Computer Science
Supervisor Professor Roy Friedman


Abstract

This dissertation improves the accuracy, memory and speed of counting algorithms.

Counters are one of the most basic building blocks in networking, big data analytics and other fields.

To cope with increasing line rates, counters must be accessed quickly and be kept on small fast memory. Further, to reach concise and exhaustive conclusions about data, accurate estimations must be kept for large numbers of counters.


Our first contribution improves the accuracy of counter estimation. We present a closed form representation of the optimal estimation function, which was previously known only in recursive form. Previous works were able to use this function to optimize the largest counter in an array of counters, but they compromised the accuracy of smaller counters in the array to do so. We introduce Independent Counter Estimation Buckets (ICE-Buckets), which improves the accuracy of all counters, large and small. We prove a tighter upper bound on the relative error and demonstrate an accuracy improvement of up to 57 times on real Internet packet traces.


Our second contribution improves the memory of heavy hitter algorithms. We introduce two novel algorithms, CSS and WCSS, for identifying heavy hitters in streams and sliding windows. These algorithms use statically allocated memory, support constant time point queries and are designed to eliminate memory overheads that existed in previous works. We demonstrate reduced memory requirements of up to 85% in streams and 66% in sliding windows over synthetic and real Internet packet traces.


Finally, we improve the speed of traffic volume estimation for elephant flows. We present the first asymptotically optimal algorithm for solving this problem in terms of both space and time complexity. This improves previous approaches, which can monitor the number of packets in a heavy hitter flow in optimal time but not the number of bytes in an elephant flow. We evaluate our work on real packet traces, demonstrating an up to X2.5 speedup compared to the best alternative.