|M.Sc Student||Izchak Oved|
|Subject||The Interaction between Workloads and Micro Architecture|
in Highly-Parallel Chip Multi-Processors
|Department||Department of Electrical and Computer Engineering||Supervisors||PROFESSOR EMERITUS Uri Weiser|
|PROF. Idit Keidar|
|PROFESSOR EMERITUS Avinoam Kolodny|
Highly parallel architectures, such as GPUs or CPUs with vector instructions, require a lot of code tuning to achieve high utilization, i.e., in the order of the theoretical maximum performance. One of the main reasons is that due to technological limitations (e.g., power consumption, power density, availability of instruction level parallelism) highly parallel architectures trade off single-instructions-stream performance for maximum raw-performance. This puts a burden on the workload to provide enough parallelism to keep the architectural resources busy. While some workloads are inherently highly parallel (such as what is known as embarrassingly parallel), many interesting, compute-intensive workloads (animation, pattern recognition, ray-tracing) become harder to parallelize as the parallelism degree increases.
In this research we developed a simulator for highly parallel architectures (up to 2048 cores) that can simulate existing parallel benchmarks (any benchmark that runs on the Linux platform) and we use it to study a suite of interesting parallel workloads, the Parsec benchmark suite.
We characterize parallelism scalability of each benchmark, namely how the performance scales with the scaling of the architecture’s parallelism (core count without overhead).
We study another aspect of the Parsec benchmark suite -- shared cache performance (miss-rate) when running with high parallelism degree. We compare the actual performance to an analytical model proposed in the literature.