טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentPerelman Dmitri
SubjectExploiting Parallelism of Multi-Core Architectures
DepartmentDepartment of Electrical Engineering
Supervisor Professor Idit Keidar
Full Thesis textFull thesis text - English Version


Abstract

The first part of this thesis addresses theoretical and practical issues of transactional memory (TM), a new synchronization abstraction, which allows threads to bundle multiple operations on memory objects into transactions. Similarly to database transactions, TM transactions are executed atomically: either all of the transaction's operations appear to take effect simultaneously (in this case, we say that the transaction commits), or none of transaction's operations are seen (in this case, we say that transaction aborts).

An effective way to reduce the number of aborts in transactional memory is to keep multiple versions of transactional objects. We therefore study inherent properties of STMs that use multiple versions to guarantee successful commits of all read-only transactions. We first show that these STMs cannot be disjoint-access parallel. We then consider the problem of garbage collecting old object versions, and show that no STM can be optimal in the number of previous versions kept. Moreover, we show that precise garbage collecting of useless versions is impossible in STMs implemented with invisible reads. As an example, we present a theoretical sample STM algorithm that uses visible reads and efficiently removes object versions once they become useless.

We refer to the practical implications of excessive aborts and multi-versioning by developing Selective Multi-Versioning (SMV), a multi-versioned STM that reduces the number of forceful aborts, especially those of long read-only transactions. SMV efficiently deals with old versions while still allowing invisible read-only transactions by relying on automatic garbage collection. It achieves x7 throughput improvement over a single-version STM and more than a two-fold improvement over an STM keeping a constant number of versions per object. Moreover, we show that the memory consumption of algorithms keeping a constant number of versions per object might grow exponentially, while SMV operates successfully even in systems with stringent memory constraints.

Another important aspect of multi-core programming is the problem of efficient data exchange among threads. We present a highly-scalable non-blocking producer-consumer task pool, designed with a special emphasis on lightweight synchronization and data locality. The core building block of our pool is SALSA, Scalable and Low Synchronization Algorithm for a single-consumer container with task stealing support. SALSA uses a novel approach for coordination among consumers, without strong atomic operations or memory barriers in the fast path. It invokes only two CAS operations during a chunk steal. Our evaluation demonstrates that a pool built using SALSA containers scales linearly with the number of threads and significantly outperforms other FIFO and non-FIFO alternatives.