|M.Sc Student||Baskin Chaim|
|Subject||Streaming Architecture for Large-Scale Quantized Neural|
Networks on an FPGA Based
|Department||Department of Electrical Engineering||Supervisor||Professor Avi Mendelson|
|Full Thesis text|
Deep neural networks (DNNs) are used by different applications that are executed on a range of computer architectures, from IoT devices to supercomputers. The footprint of these networks is huge as well as their computational and communication needs. In order to ease the pressure on resources, research indicates that in many cases a low precision representation (1-2 bit per parameter) of weights and other parameters can achieve similar accuracy while requiring fewer resources. Using quantized values enables the use of FPGAs to run NNs, since FPGAs are well ﬁtted to these primitives; e.g., FPGAs provide efficient support for bitwise operations and can work with arbitrary-precision representation of numbers.
This paper presents a new streaming architecture for running QNNs on FPGAs. The proposed architecture scales out better than alternatives, allowing us to take advantage of systems with multiple FPGAs. We also included support for skip connections, that are used in state-of-the art NNs, and shown that our architecture allows adding those connections almost for free. We propose abstraction of NN building blocks which allow an easy way for building and implementing CNN architectures. Using the new proposed technique, allowed us to implement an 18-layer ResNet for 224 ? 224 images classiﬁcation, that can achieve a 57.5% top-1 accuracy.
In addition, we implemented a full-sized quantized AlexNet. In contrast to previous works, we use 2-bit activations instead of 1-bit ones, which improve AlexNet’s top-1 accuracy from 41.8% to 51.03% for the ImageNet classiﬁcation. Both AlexNet and ResNet can handle 1000-class real-time classiﬁcation on an FPGA.
Our implementation of ResNet-18 consumes 5? less power and is 4? slower for ImageNet, when compared to the same NN on the latest Nvidia GPUs. Smaller NNs, that ﬁt a single FPGA, are running faster than on GPUs on small (32?32) inputs, while consuming up to 20? less energy and power.