|M.Sc Student||Eliahu Adi|
|Subject||Programmable Processing-in-Memory Memristive Architectures|
|Department||Department of Electrical and Computer Engineering||Supervisor||ASSOCIATE PROF. Shahar Kvatinsky|
|Full Thesis text|
The von Neumann architecture, in which the memory and the computation units are separated, demands massive data traffic between the memory and the CPU.
To reduce data movement, new technologies and computer architectures have been explored. The use of memristors, which are devices with both memory and computation capabilities, has been considered for different processing-in-memory (PIM) solutions. Two main PIM approaches have been discussed in the literature. The first approach uses memristive stateful logic for digital PIM systems, and the second one uses memristors for analog computation, e.g., vector-matrix multiplication.
Since the memristors are emerging devices that enable new computing paradigms, memristive architectures are still in their early stage of development, and there are difficulties that accompany the integration of these devices in different computer architectures. The integration is even more challenging in programmable architectures, which only recently started to incorporate memristors in them, since until recently, the research has mainly focused in the circuit design aspect.
In this dissertation, two programmable PIM architectures and the challenges associated with integrating memristors in them are discussed. The first architecture, AbstractPIM, is a general-purpose architecture that uses memristors to perform in-memory computation. Previous efforts to design such architecture have focused on a specific stateful logic family, and on optimizing the execution for a certain target machine. These solutions require new compilation when changing the target machine, and provide no backward compatibility with other target machines. In AbstractPIM, a new compilation concept and flow which enables executing any function within the memory are presented, using different stateful logic families and different instruction set architectures (ISAs). By separating the code generation into two independent components, intermediate representation of the code using target independent ISA and then microcode generation for a specific target machine, a flexible flow with backward compatibility, which lays foundations for a PIM compiler, is provided.
Using AbstractPIM, various logic technologies and ISAs and how they impact each other are explored. In addition, the challenges associated with it, such as the increase in execution time, are discussed.
The second programmable architecture discussed in this dissertation, multiPULPly, is an application-specific, ultra-low-power neural network accelerator.
Computationally-intensive neural network applications often need to run on resource-limited low-power devices. Numerous hardware memristive accelerators have been developed to speed up the performance of neural network applications and reduce power consumption; however, most focus on data centers and full-fledged systems. Acceleration in ultra-low-power systems has been only partially addressed.
This dissertation presents multiPULPly, an accelerator which integrates memristive technologies within standard low-power CMOS technology to accelerate multiplication in neural network inference on ultra-low-power systems, is presented.
This accelerator was designated for PULP, an open-source microcontroller system that uses low-power RISC-V processors. Memristors were integrated into the accelerator to enable power consumption only when the memory is active, to continue the task with no context-restoring overhead, and to enable highly-parallel analog multiplication. To reduce the energy consumption, dataflows that handle common multiplication scenarios and are tailored for the suggested architecture are used.