|Ph.D Student||Voitsechov Dani|
|Subject||MT-CGRA: Multithreaded Coarse Grained Reconfigurable- A Fast|
and Energy Efficient Alternative for GPGPUs
|Department||Department of Electrical Engineering||Supervisor||Professor Yoav Etsion|
|Full Thesis text|
Over the last
decade, GPUs have been successfully deployed to accelerate a wide array of
highly parallel general purpose applications ranging from machine learning
tasks such as image classification, speech recognition and natural language
processing to complex mathematical computations such as Gaussian elimination
and matrix multiplication. These massively parallel processors are capable of
achieving high computational throughput while maintaining high power
efficiency. Nevertheless, existing GPUs employ a von-Neumann compute engine
and, therefore, suffer from the model's power inefficiencies.
This work presents a Multithreaded Coarse-grain Reconfigurable Architecture (MT-CGRA) that combines coarse-grain reconfigurable computing with static and dynamic dataflow to deliver massive thread-level parallelism. The CUDA-compatible MT-CGRA architecture is positioned as a fast and energy efficient design alternative for GPGPUs. The architecture maps a compute kernel, represented as a dataflow graph, onto a coarse-grain reconfigurable fabric composed of a grid of interconnected functional units. These functional units dynamically schedule instances of the same static instruction and thus enable streaming the data of multiple threads through the grid. The combination of statically mapped instructions and direct communication between functional units obviate the need for a full instruction pipeline and a centralized register file, whose energy overheads burden GPGPUs.
Our simulations of various CUDA benchmarks running on the new system show that MT-CGRAs provide an average speedup of 2.5x (13.5x max) and reduces system power by an average of 7x (33x max), when compared to an equivalent Nvidia GPGPU.