Ph.D Student Ph.D Thesis Goikhman Alexander Computation and Communication in Data Parallel Programs in Distributed Memory Multiprocessors Department of Electrical and Computer Engineering PROFESSOR EMERITUS Jacob Katzenelson

Abstract

An effective integration of a hyperbolic PDE by a numeric scheme is an important topic in science and engineering. Some common examples of the PDE problems are the weather prediction problem, the elastic-plastic flow problem, etc. Usually the computation of the problem is done on a massively parallel computer using domain decomposition. The domain decomposition maps a subset of problem's data to each compute node in the computer. The compute node integrates its data in some order of computation and communicates with its nearest-neighbor compute nodes to interchange border values. This order is mostly oblivious to the order of the computation at the neighbor nodes. In this work I show that the order of the computation and the communication in the nodes has a major effect on the performance of the machine. I show that there is a specific order of the computation and the communication, which is called the spiral computation order, that can enable complete communication latency hiding, and enables a static communication schedule of the network, improving the utilization of the network resources. As such , the spiral computational order enables the qualitative problem-specific association of the needed computation to the needed communication resources, defining the grain size, i.e., the most effective domain decomposition of the problem size relatively to the machine parameters, i.e., computational power, memory bandwidth , communication latency and bandwidth, etc .

To obtain insight to the performance of the spiral model on a 2D PDE problem, a time-domain analytical model, called the recurrence equations was constructed . The recurrence equations are based on the notion of event, whose numeric value that corresponds to the time point at which a predefined run-time condition becomes satisfied. A phase is defined as the difference in time of corresponding events on different nodes. The recurrences equations capture the relationship between events and relate them to the parameters of the machine. The solution of the recurrence equations with specified initial conditions, called trajectory, enables the prediction of the time-domain behavior of the system. A symbolic solver program was constructed to symbolically search for all possible trajectories of the specific PDE problem. The results that the solver provides are that at certain initial conditions the phase trajectory converges to a system constant. At other initial conditions, the phase trajectory becomes periodic, resembling chaotic motion.