Ph.D Student | Goikhman Alexander |
---|---|

Subject | Computation and Communication in Data Parallel Programs in Distributed Memory Multiprocessors |

Department | Department of Electrical Engineering |

Supervisor | Professor Emeritus Jacob Katzenelson |

Full Thesis text |

An effective integration of a hyperbolic
PDE by a numeric scheme is an important topic in science and engineering. Some
common examples of the PDE problems are the weather prediction problem, the elastic-plastic
flow problem, etc. Usually the computation of the problem is done on a
massively parallel computer using *domain decomposition*. The domain
decomposition maps a subset of problem's data to each compute node in the
computer. The compute node integrates its data in some order of computation and
communicates with its nearest-neighbor compute nodes to interchange border
values. This order is mostly *oblivious* to the order of the computation
at the neighbor nodes. In this work I show that the order of the computation and
the communication in the nodes has a major effect on the performance of the
machine. I show that there is a specific order of the computation and the communication,
which is called the *spiral computation order*, that can enable complete
communication latency hiding, and enables a static communication schedule of
the network, improving the utilization of the network resources. As such , the spiral computational order enables the
qualitative problem-specific association of the needed computation to the
needed communication resources, defining the *grain size*, i.e., the most effective
domain decomposition of the problem size relatively to the machine parameters,
i.e., computational power, memory bandwidth , communication
latency and bandwidth, etc .

To obtain insight to the performance of the
spiral model on a 2D PDE problem, a time-domain analytical model, called the *recurrence
equations* was constructed . The recurrence equations are based on the
notion of *event*, whose numeric value that corresponds to the time point
at which a predefined run-time condition *becomes satisfied*. A *phase*
is defined as the difference in time of corresponding events on different
nodes. The recurrences equations capture the relationship between events and
relate them to the parameters of the machine. The solution of the recurrence
equations with specified initial conditions, called trajectory, enables the prediction of the
time-domain behavior of the system. A symbolic solver program was constructed
to symbolically search for all possible trajectories of the specific PDE
problem. The results that the solver provides are that at certain initial
conditions the phase trajectory converges to a system constant. At other
initial conditions, the phase trajectory becomes periodic, resembling chaotic
motion.