Ph.D Thesis | |

Ph.D Student | Liss Liran |
---|---|

Subject | Realizing the Performance of Contemporary Parallel and Distributed Systems: Theoretical and Systems Perspectives |

Department | Department of Electrical and Computers Engineering |

Supervisors | ASSOCIATE PROF. Yitzhak Birk |

PROF. Assaf Schuster | |

Full Thesis text |

This work addresses the two predominant multi-computer settings today: tightly-coupled parallel computing clusters and extremely large-scale distributed systems. It revisits the assumptions on which modern parallel and distributed systems are constructed and explores ways of enhancing performance, taking both theoretical and practical points of view .

**Computing clusters . **The
communication bottleneck at the end nodes themselves is often the major cause
of poor performance . While System Area Networks (SANs), which
provide lean , high-performance communication mechanisms
and interfaces , alleviate this bottleneck, considerable
performance gaps remain , mainly due to
substantial mismatches between the software architecture and hardware .

One source for such mismatches is the strong semantics of standard messaging APIs, which are not always required by the application. We present an approach whereby network and operating-system primitives are integrated in the kernel to provide a high-performance platform that is well matched to application semantics. An alternative approach that we explore entails avoiding most overheads using programming models that are better matched to the hardware's semantics. The applicability of these approaches is demonstrated by a demanding Distributed Shared Memory (DSM) system and a high-performance Bulk Synchronous Parallel (BSP) runtime system, which we designed and implemented for this purpose.

**Extremely large scale systems . **The ever-increasing demand for scalability has driven
locality considerations into every aspect of algorithm design . Unfortunately, there are many important problems that cannot be
solved in a local manner per se, i.e., *O(1)* complexity in problem size
for all instances. Additionally, the common performance measures, namely worst-
and average-case complexities , fail to
capture the essence of real problems in a meaningful way.

Concentrating on the important problem of global data aggregation, we define a new metric on problem instances, Veracity Radius (VR), which captures the inherent possibility to compute them locally. We prove that VR yields a tight lower bound on computation time, and provide an efficient algorithm whose performance is proportional to VR for every problem instance rather than the graph size. We then extend our findings to ongoing models, providing both lower bounds and optimal aggregation algorithms that match them up to a constant factor.

In summary, this work explores the currently most prominent themes of parallel and distributed computation, identifying and addressing key issues. The insights gained and building blocks provided may furthermore be useful in the future when new situations, possibly combinations of the two, arise.