M.Sc Thesis

M.Sc StudentShkolnik Moran
SubjectRobust Quantization and Semi-Uniform Quantization:
Hardware-Aware Methods for Deep Neural Network
DepartmentDepartment of Electrical and Computers Engineering
Supervisor PROFESSOR EMERITUS Uri Weiser
Full Thesis textFull thesis text - English Version


Quantization is a key technique for reducing the size of deep neural networks (DNNs) and accelerating their execution.

Accelerating a quantized network is not trivial since it requires hardware support for the quantization algorithm.

Today, most dedicated hardware accelerators support only uniform quantizers, forcing us to quantize models using uniform quantization methods.

Although DNN quantization continues to be heavily researched, two fundamental problems with today's uniform quantization methods are still to be resolved.

One well-known problem is the quantized model's inferior accuracy compared to the full-precision model's baseline accuracy.

This significant drop is a result of high quantization noise injected during the uniform quantization process.

The second problem is embodied by the lack of resistance to typical variations in the quantization process.

Today there are new exciting applications where the quantization process is not static and can vary to meet different circumstances and implementations. For such applications, having robust models is a crucial requirement.

We show that these two problems depend mainly on two variables: the quantizer type and the network tensors' distribution shape.

Using theoretical arguments and proofs, we show that choosing a uniform type quantizer for tensors with a bell-shaped distribution results in sub-optimal accuracy performance and robustness performance.

We demonstrate that when limiting and, moreover, compelling one of these factors to remain constant, i.e., introducing a constraint to the system we want to optimize, the other factor can be changed to obtain an optimal solution to the problems we mentioned.

The performance optimization problem within a system with constraints is a very realistic scenario. Consider the common case of a given hardware accelerator with a given uniform type quantizer. In such a case, the quantizer type is the hard constraint, and our goal in this research was to find the desired tensor distribution shape that optimizes the accuracy and robustness performance.

In other cases, the tensors' distribution is a given, and we can choose the optimal quantizer type.

Our work addresses each of the performance optimization problems and offers a separate solution to each of the problems assuming only one of the constraints exists at a time.

Our solutions rely on analytical and rigorous proofs, and their effectiveness is validated on different ImageNet models.