טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentShpiner Alexander
SubjectScaling Data Center Routers
DepartmentDepartment of Electrical Engineering
Supervisor Professor Isaac Keslassy
Full Thesis textFull thesis text - English Version


Abstract

A data center is a facility used to house computer systems, which are connected by a communication network. The data center network is considered as one of the most challenging network environments for the routers due to its fast link rates, short propagation times and high performance demands. In this thesis we analyze the two main data-center router functions: packet processing and packet switching.

In the first part we analyze two functions related to packet processing: address resolution and order preservation. First, data centers can run multiple virtual machines (VMs), and potentially place them on any of the servers. Therefore, a VM address resolution method that determines the server location of any VM needs to be provided for inter-VM communication. Unfortunately, existing methods suffer from a scalability bottleneck in the network load of the address resolution messages and/or in the size of the address resolution tables. We propose Smart Address Learning (SAL), a novel approach that solves this scalability bottleneck.

Next, we introduce novel scalable scheduling algorithms for preserving flow order in parallel multi-core network processors. The development of new processing features in advanced network processors has resulted in increasingly parallel architectures and increasingly heterogeneous packet processing times. This may lead to large packet reordering delays. Our suggested algorithms can significantly reduce reordering delay.

In the second part we study the starvation and unfairness phenomena that occur in the packet switching function of the router. Most of the data center network based applications use a reliable Transmission Control Protocol (TCP). Its performance capabilities were analyzed using ideal router models in global networks, but its usage with real routers in a data center network may lead to performance degradation. Data center routers use small buffers to lower the delays, and thus can incur throughput collapse for short TCP flows as well as temporary starvation for long TCP flows. We introduce a lightweight hash-based algorithm called Hashed Credits Fair (HCF) that helps solve both of these problems.

Finally, we analyze the interactions of router-based switch scheduling algorithms and the user-based congestion control of TCP. Both the switch scheduling and the TCP congestion control aim to increase the throughput and the fairness, but we show that their interactions can actually have a detrimental opposite effect. We show that these interactions can lead to extreme unfairness with temporary starvation, and to large rate oscillations. We characterize the network dynamics for these switch scheduling algorithms.