M.Sc Thesis

M.Sc StudentZdornov Vladimir
SubjectMitigating Congestion in High-Speed Interconnects for
Computer Clusters
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROFESSOR Yitzhak Birk


In large computer clusters such as nearly all current super computers, congestion is an inherent problem, which arises due to oversubscription of communication resources. Moreover, the limited number of buffers and lossless nature of the fabric result in rapid "congestion spreading", whereby communication along paths that do not traverse congested links is also severely degraded.

Cluster interconnection networks constitute a managed environment, in which new protocols can be used without jeopardizing interoperability. This property, combined with very low delays and reliable communication, creates an opportunity for employment of sophisticated control mechanisms aimed at avoiding damage caused by congestion-related phenomena.

Two fundamental types of control can be used to address the congestion problem: routing and rate control. While the former reduces contention over resources, the latter ensures their fair and efficient sharing. In this work, we propose novel mechanisms for both types of control, and study them under performance measures specific to cluster networks.

Our adaptive routing uses virtual circuits to guarantee in-order packet delivery. The path setup process is performed by switches locally. Importantly, we avoid blocking and backtracking. The scheme is shown to reduce the maximum contention by up to 50% in a slightly augmented fat tree, for random permutation traffic.

The existing approaches to rate control struggle to set correct rates in cluster networks. The window-based control of TCP is inappropriate. Low bandwidth-delay product determines the maximum window size to be of several MTU packets. Due to the small buffers, however, congestion related phenomena may arise even if single packet windows are employed. InfiniBand Congestion Control Architecture depends on a multitude of parameters, the setting of which has to ensure correct behavior in the general case scenario. In practice, however, it seems that tuning of the parameters is required for each specific topology and traffic pattern configuration.

Unlike the above schemes, our approach is based on explicit rate calculation. We formulate three different performance goals and present distributed algorithms that achieve them (each algorithm achieves some of the goals). Simulation results for synthetic benchmarks demonstrate the algorithms to have significant impact on performance. In addition, we prove tight theoretical bounds on the performance of the algorithms under the performance measures for which they were not designed.