טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentKaplan Roman
SubjectIn-Memory Accelerator Architectures for Bioinformatics and
Machine Learning
DepartmentDepartment of Electrical Engineering
Supervisor Professor Ran Ginosar
Full Thesis textFull thesis text - English Version


Abstract

Current computer architectures separate between units of processing and memory. With the increasing demand for memory bandwidth by emerging applications such as bioinformatics and machine learning, this separation creates the notorious von Neumann bottleneck.


In my research, I propose two new processing-in-memory architectures that overcome the von Neumann bottleneck, resulting in orders of magnitude improved performance. I show how architectures that combine memory and processing in the bitcell level, using emerging memory technologies, enable true processing-in-memory. In addition, energy efficiency is also shown to be significantly improved from the reduction of data transfers between processing units and memory.

The first architecture I propose is based on a Resistive Content Addressable Memory (ReCAM) and uses associative processing to perform any arithmetic or logic operation. I show implementations of several key algorithms from bioinformatics and machine learning on the ReCAM, including performance and energy evaluations and comparison to existing solutions.


The second architecture I proposed targets a challenging and computationally intensive problem in bioinformatics, DNA long read mapping. A novel Resistive Approximate Similarity Search Accelerator, RASSA, exploits charge distribution and parallel in-memory processing to reflect a mismatch count between DNA sequences. RASSA is a massively parallel in-memory processor, facilitating simultaneous compare and mapping of a long read onto a reference sequence. The key performance breakthrough of RASSA is achieved by applying the similarity search in parallel to the entire reference, achieving computation complexity in O(n), instead of (mn). I show that RASSA implementation of DNA long read pre-alignment outperforms the state-of-art solutions by up to two orders of magnitude with comparable accuracy.