טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentYavits Leonid
SubjectAnalysis and Optimization of Parallel Computing
Architectures and in-Memory Computing
DepartmentDepartment of Electrical Engineering
Supervisor Professor Ran Ginosar
Full Thesis textFull thesis text - English Version


Abstract

Present work is divided into two parts. The first part is dedicated to analysis and optimization of parallel and manycore architectures in a variety of aspects, from formulating new corollaries to Amdahl’s law, to applying state of art optimization methods to enhancing multicore architectures. The second part of my work is devoted to developing a new massively parallel processing architecture, based on associative processing and in-memory computing.

In my first work I analyze the effects of data synchronization and inter-core communication on multicore speedup and scalability. I find that in applications with high inter-core communication requirements, the workload should be executed on a smaller number of cores, and applications of high sequential to parallel synchronization requirements may better be executed by the sequential core. I formulate a corollary to Amdahl’s law to reflect the effects of data synchronization and inter-core communication.

Also in the first part of my research I develop a closed-form analytical framework for optimizing multicore cache hierarchy and optimally allocating area among hierarchy levels under a variety of constrained chip resources. I further develop this framework to apply it to three-dimensional (3D) cache design, where the question of optimal partitioning into multiple silicon layers is also addressed.

Lastly, I study the influence of temperature on performance and scalability of 3D CMP from Amdahl’s law perspective. I find that 3D CMP may reach its thermal limit before reaching its maximum power. I show that a high level of parallelism may lead to high peak temperatures even in small scale 3D CMPs, thus limiting 3D CMP scalability and calling for different, in-memory computing architectures.

In the second part of my work, I present a novel computer architecture that resolves the issue of synchronization by in-memory computing, through combining data storage and massively parallel processing. In this architecture, a last level cache and a SIMD accelerator are replaced by an Associative Processor. Associative Processor (AP) combines data storage and data processing and provides parallel computational capabilities and data memory at the same time. Comparative analysis shows that this novel architecture may outperform a conventional architecture comprising a SIMD coprocessor and a shared last level cache while consuming less power.

My next work is dedicated to implementing sparse matrix multiplication (SpMM) on AP. Four SpMM algorithms are explored, combining AP and baseline CPU processing to various levels. The AP is found to be especially efficient in binary sparse matrix multiplication. AP is shown to be more power efficient than a variety of conventional solutions.

As CMOS feature scaling slows down, conventional memories such as CAM experience scalability problems. In another work, I propose and investigate an AP based on resistive CAM - the Resistive AP. I show that resistive memory technology potentially allows scaling the AP from few millions to few hundred millions of processing units on a single silicon die.