|Ph.D Student||Liss Liran|
|Subject||Realizing the Performance of Contemporary Parallel and|
Distributed Systems: Theoretical and Systems
|Department||Department of Electrical Engineering||Supervisors||Professor Yitzhak Birk|
|Professor Assaf Schuster|
|Full Thesis text|
This work addresses the two predominant multi-computer settings today: tightly-coupled parallel computing clusters and extremely large-scale distributed systems. It revisits the assumptions on which modern parallel and distributed systems are constructed and explores ways of enhancing performance, taking both theoretical and practical points of view .
Computing clusters . The communication bottleneck at the end nodes themselves is often the major cause of poor performance . While System Area Networks (SANs), which provide lean , high-performance communication mechanisms and interfaces , alleviate this bottleneck, considerable performance gaps remain , mainly due to substantial mismatches between the software architecture and hardware .
One source for such mismatches is the strong semantics of standard messaging APIs, which are not always required by the application. We present an approach whereby network and operating-system primitives are integrated in the kernel to provide a high-performance platform that is well matched to application semantics. An alternative approach that we explore entails avoiding most overheads using programming models that are better matched to the hardware's semantics. The applicability of these approaches is demonstrated by a demanding Distributed Shared Memory (DSM) system and a high-performance Bulk Synchronous Parallel (BSP) runtime system, which we designed and implemented for this purpose.
Extremely large scale systems . The ever-increasing demand for scalability has driven locality considerations into every aspect of algorithm design . Unfortunately, there are many important problems that cannot be solved in a local manner per se, i.e., O(1) complexity in problem size for all instances. Additionally, the common performance measures, namely worst- and average-case complexities , fail to capture the essence of real problems in a meaningful way.
Concentrating on the important problem of global data aggregation, we define a new metric on problem instances, Veracity Radius (VR), which captures the inherent possibility to compute them locally. We prove that VR yields a tight lower bound on computation time, and provide an efficient algorithm whose performance is proportional to VR for every problem instance rather than the graph size. We then extend our findings to ongoing models, providing both lower bounds and optimal aggregation algorithms that match them up to a constant factor.
In summary, this work explores the currently most prominent themes of parallel and distributed computation, identifying and addressing key issues. The insights gained and building blocks provided may furthermore be useful in the future when new situations, possibly combinations of the two, arise.