טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentZur Yochai
SubjectDifferentiable Neural Architecture Search with an
Arithmetic Complexity Constraint
DepartmentDepartment of Computer Science
Supervisor Professor Alexander Bronstein


Abstract

Deep learning in general and convolutional neural networks (CNN) in particular have
demonstrated spectacular success on a variety of tasks becoming a de facto standard in machine learning. Deep learning techniques allow to train the parameters of the network completely automatically from the data. However, architectural choices such as the number and the specific topology of the layers are usually designed by hand.
In fact, the most successful CNN architectures in use today were designed by trial
and error. Neural Architecture Search (NAS) aims at automating the design of neural network architectures for a given task. Being part of a larger automated machine learning (AutoML) trend, it promises to alleviate the scarcity of machine learning experts needed to design custom architectures, and do this task better than humans.
While being a very popular tool in modern machine learning arsenal, CNNs are
typically very demanding computationally at inference time. One of the main ways to alleviate this burden is quantization relying on the use of low-precision arithmetic representation for the weights and the activations, leading to more efficient arithmetic operations. The resulting reduction of computational complexity depends on the exact hardware platform, but is typically significant. One of the big questions in designing a quantized neural network is how to allocate bit widths to different filters achieving optimal tradeoff between computational complexity and loss in accuracy. Another popular method that has so far been studied independently is the pruning of the number of filters in each layer.

The present study is an attempt to formulate optimal bit allocation and pruning as a NAS problem, searching for the best configuration of allocated bits satisfying a computational complexity budget while maximizing the model accuracy. While conventional NAS methods apply evolution or reinforcement learning over a discrete and nondifferentiable search space, recently, a differentiable search method has been proposed.

 The method is based on the continuous relaxation of the architecture representation, allowing an efficient search of the architecture using gradient descent. We introduce a differentiable NAS method allowing to find superior heterogeneous architectures, i.e., CNNs in which each filter can be quantized with a different bitwidth or each layer can have a different number of filters, and evaluate its performance.