|M.Sc Student||Haj Ali Ameer|
|Subject||Performing Image Processing in Memristive Memory|
|Department||Department of Electrical and Computer Engineering||Supervisor||ASSOCIATE PROF. Shahar Kvatinsky|
|Full Thesis text|
In modern von Neumann systems, the data is stored in a memory but processed in a separate processing unit. Data transfer between these units incurs energy and delay several orders of magnitude greater than the energy and delay incurred by computation itself. To address this problem, many emerging architectures are being proposed. One promising approach is to use memristors, special elements which are capable of both storing data and computing. A previously proposed logic implementation technique called MAGIC allows performing NOR gates within a crossbar structure, where the inputs of the gates are the initially stored logical states of the input memristors, and the output is the logical state of the output memristor at the end of the computation. This enables the execution to be completely within the memory, allowing us to build a memristive Memory Processing Unit (mMPU) where MAGIC NOR is employed as the basis for all data processing.
This research deals with supporting MAGIC-based digital image processing algorithms within the mMPU. First, a tool is proposed for analytical modeling of MAGIC and other necessary memory operations inside the non-ideal, size-limited, memristive memory arrays. This tool allows designers to determine the required voltages, isolation schemes and memristor parameters that guarantee proper functionality within the memory arrays given all non-idealities such as the parasitic resistances, sneak paths, and peak power. Second, four algorithms for efficient execution of fixed-point multiplication using MAGIC gates are proposed. These algorithms achieve superior latency and throughput than a previous work and significantly reduce the area cost. They can thus be feasibly implemented inside the size-limited memory arrays. These fixed-point multiplication algorithms are used to efficiently perform more complex in-memory operations such as image convolution. Maximizing the parallelism by partitioning large images to multiple arrays is also explored. A functional, cycle-accurate simulator was built to evaluate and verify all the proposed algorithms. Deploying these algorithms in the mMPU provides superior performance over state-of-the-art, processing in-memory architectures for data intensive applications.