טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentBergman Aran
SubjectSharing Cloud Networks
DepartmentDepartment of Electrical Engineering
Supervisor Professor Isaac Keslassy


Abstract

Cloud computing has become ubiquitous in our lives. We store our Dropbox files and run our Google searches on the cloud. However, the cloud does not only provide storage and compute resources: it also provides significant networking resources that have largely gone unexplored in previous work.


In this work we investigate different aspects of cloud networks - how the cloud datacenter networks can be improved and how the cloud inter-datacenter network can be used to improve data delivery across the Internet. We further explore how the application data stored in a datacenter can be better compacted for one specific application, namely - a public web mail service.


We first explore using public clouds and their networks to improve data transfers between two endpoints on the Internet. Data delivery through relays in the cloud is of quickly growing importance and is claimed to accelerate data transport. However, the reasons for this acceleration and the exact recipe for optimizing performance remain little understood. We aim to quantify the potential of cloudified data delivery and learn how to tap it. To this end, we explore the performance of different data transport strategies through hundreds of thousands of file downloads. Our results show that even simple strategies can improve file download times by an order of magnitude. A recurring theme in our results is that congestion control plays the crucial role in achieving high performance.


We then look at how the performance of TCP within a datacenter can be controlled and improved by the datacenter operator even when different tenants or applications are running different versions of virtual machines with varying congestion control algorithms in their TCP stack implementations. We take advantage of the fact that all traffic passes through hypervisors controlled by the multitenant datacenter owner and provide a translation layer in the hypervisors which ensures that the whole datacenter uses a single best-of-breed congestion control algorithm, while giving the illusion to each of the Virtual Machine guests that it keeps using its own congestion control algorithm. We named this approach virtualized congestion control (vCC). This solution opens the path to a virtualization of the congestion control mechanism used within the datacenter, making it also future-proof, so that new congestion control algorithms can be used within the datacenter network without requiring any change in the guest operating systems or their configuration. We list a wide range of techniques a hypervisor can implement to translate the congestion control algorithm and improve the TCP performance in a datacenter.


Finally, we turn to the data stored in modern datacenters which provide public web mail service and show how we can use existing compression algorithms and libraries to compress a mail corpus by a factor of 2.5 compared to the naïve compression currently employed. This is achieved by exploiting the high similarity between mail messages which are mostly machine generated. We accomplish this enhancement by properly reordering the messages prior to compressing them and then basing the compression of each message on the previous one.