טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentOved Izchak
SubjectThe Interaction between Workloads and Micro Architecture
in Highly-Parallel Chip Multi-Processors
DepartmentDepartment of Electrical Engineering
Supervisors Professor Emeritus Weiser Uri
Full Professor Keidar Idit
Professor Emeritus Kolodny Avinoam
Full Thesis textFull thesis text - English Version


Abstract

Highly parallel architectures, such as GPUs or CPUs with vector instructions, require a lot of code tuning to achieve high utilization, i.e., in the order of the theoretical maximum performance. One of the main reasons is that due to technological limitations (e.g., power consumption, power density, availability of instruction level parallelism) highly parallel architectures trade off single-instructions-stream performance for maximum raw-performance. This puts a burden on the workload to provide enough parallelism to keep the architectural resources busy. While some workloads are inherently highly parallel (such as what is known as embarrassingly parallel), many interesting, compute-intensive workloads (animation, pattern recognition, ray-tracing) become harder to parallelize as the parallelism degree increases.

In this research we developed a simulator for highly parallel architectures (up to 2048 cores) that can simulate existing parallel benchmarks (any benchmark that runs on the Linux platform) and we use it to study a suite of interesting parallel workloads, the Parsec benchmark suite.

We characterize parallelism scalability of each benchmark, namely how the performance scales with the scaling of the architecture’s parallelism (core count without overhead).

We study another aspect of the Parsec benchmark suite -- shared cache performance (miss-rate) when running with high parallelism degree. We compare the actual performance to an analytical model proposed in the literature.