|Ph.D Student||Gilad Eran|
|Subject||Parallel Execution using Memory Versioning and Renaming|
|Department||Department of Computer Science||Supervisor||Professor Yoav Etsion|
As the number of cores in modern computers increases, the most practical way to efficiently utilize all cores is still designing parallel algorithms, which are hard to reason about, and implementing them in parallel programs, which are hard to debug. High-level programming models that aim to ease parallel programming either rely on a task-based specification, in which the programmer defines small, minimally dependent, sequences of code, or merely eliminate technical aspects such as data partitioning and thread creation. Unfortunately, without revisiting the interaction between software and hardware, easier programming comes at the price of less parallelism.
The memory interface is a striking example of a hardware/software interface that is ill-suited for parallelism, and specifically for parallel access to the program's state. Even on modern computers, that interface follows the design described in the von Neumann architecture more than 70 years ago. This interface design is fundamentally sequential and suited for serial instruction execution. Indeed, modern memory systems allow multiple cores to read and write state concurrently, and the underlying implementation can even handle out-of-order accesses done by a single thread. Still, the mainstream memory interface merely supports parallel execution; it is not designed to ease parallel programming or be tightly integrated with high-level programming models.
This dissertation describes an extended memory system, whose main features are versioning and renaming. Versioning allows programs to encode dependencies along with the state, enabling the memory systems to enforce proper access ordering. Renaming allows multiple versions of the same memory location to coexist, eliminating false dependencies. When leveraged by a task-based programming model, the extended memory system facilitates efficient parallel execution of common sequential algorithms, even when their state is managed by dynamic data structures. The outcome of the execution is deterministic and identical to a sequential execution of the same code. A hardware implementation of the memory extension is also detailed, along with practical requirements such as an effective garbage collection.