Ph.D Thesis


Ph.D StudentShomron Gil
SubjectEmbracing Sparsity and Resiliency for Deep Learning
Acceleration
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROFESSOR EMERITUS Uri Weiser
Full Thesis textFull thesis text - English Version


Abstract

Deep neural networks (DNNs) have gained tremendous momentum in recent years, both in academia and industry. They introduce state-of-the-art results in numerous applications, such as image classification, object segmentation, and speech recognition. Yet, DNNs are compute intensive and may require billions of multiply-and-accumulate operations for a single input query. Limited resources, such as those in IoT devices, latency constraints, and high input throughput, all drive research and development of efficient computing methods for DNN execution.


DNNs are built out of layers that perform abundant simple, parallel, systematic, and predictable arithmetic operations between weights and activations. As opposed to conventional general-purpose applications which are explicitly coded for specific tasks, DNNs learn their task from data during a training process. In our research, we rethink two well-known CPU methods - simultaneous multithreading (SMT) and value prediction - and map them to the new environment introduced by DNNs, by leveraging their unique characteristics. First, DNNs are resilient, that is, they can tolerate noise in parameters (e.g., quantization) or during MAC operations with only a “graceful degradation” in accuracy; and second, DNNs usually comprise many zero-valued activations and weights. Leveraging activation sparsity is particularly challenging, since activation values are input dependent, i.e., zero-valued activations are both dynamic and unstructured.


With SMT, we propose a new concept of non-blocking SMT (NB-SMT), in which execution units are shared among several computational flows to avoid idle MAC operations due to zero-valued operands. In the scenario of a structural hazard on a shared execution unit, we propose to temporarily and locally “squeeze in” the operations by reduced precision. We present and discuss the path from a data-driven “blocking” SMT design to the concept of NB-SMT, to a fine-tuned sparsity-aware quantization method, in which leading zeros in the value representation are avoided. As for value prediction, we present prediction schemes which leverage the inherent spatial correlation in convolutional neural networks (CNNs) feature maps to predict zero-valued activations. By speculating which activations will be zero-valued, we potentially reduce the required MAC operations of CNNs.