טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentShklover Gregory
SubjectExplicitly-Parallel Code Compilation Methods for Shared-
Context Architecture
DepartmentDepartment of Computer Science
Supervisor Professor Assaf Schuster
Full Thesis textFull thesis text - English Version


Abstract

In this work we investigate compilation methods for explicitly parallel code targeted at Inthreads - a lightweight, shared-context multi-threaded architecture.

    The Inthreads architecture provides the means for expressing parallelism at a very low granularity level. Contrary to most other architectures, Inthreads executes all the threads in a shared architectural context: the threads share not only the memory, but also the register file. It also provides a set of basic synchronization primitives and defines strict rules for accessing shared resources: a correct Inthreads program must be data-race-free and use explicit architectural synchronization instructions to synchronize access to shared memory or registers.

    We use a two-stage compilation flow where an automatic parallelizer or a programmer provides an explicitly-parallel source code, that is then compiled into machine code by an Inthreads compiler. For a correct compilation process, the compiler must be capable of understanding the semantics of the parallel code. It is also required to adjust traditional optimization transformations to the parallel semantics of the code.

    To preserve correctness of the compiled parallel code during the compilation process we define correctness requirements that make use of the architecture and the program correctness. We modify the traditional analysis and optimization algorithms to adhere to the defined correctness requirements.

    Shared register file, as provided by Inthreads, is rarely available in other parallel processing frameworks. In this work, we investigate the register allocation for parallel threads and show that graph coloring register allocation model can be natively extended to handle register allocation for parallel threads.

   We define the changes to the graph coloring algorithm steps that include building parallel interference graph, coalescing, simplification and spilling. For spilling, we discuss two well-known spill methods: spill-everywhere and interference region spilling. We show that while the spill-everywhere can be directly applied to parallel code, interference region spilling presents several correctness challenges.

   We study the spill cost heuristic that governs the algorithm behavior and present empirical results on the quality and complexity of the graph coloring algorithm when applied to parallel code.