|M.Sc Student||Rozenberg Shai|
|Subject||Improved Detection of Adversarial Attacks via Penetration|
|Department||Department of Electrical Engineering||Supervisor||Professor Ran El-Yaniv|
|Full Thesis text|
Defending machine learning models from adversarial attacks, the process in which a
maliciously designed algorithm alters an image so that it is misclassified, has become
an increasingly pressing issue as deep neural networks are utilized in evermore facets
of daily life. Inspired by the certificate defense approach, we propose a defense against such attacks. We developed the maximal adversarial distortion (MAD) optimization routine for robustifying deep networks that captures the idea of increasing separability of class clusters in the embedding space while reducing the network sensitivity to small distortions. This is achieved by explicitly increasing the margin between clusters in the embedding space while reducing the norm of the Jacobian. Given a deep neural network for an image classification problem, an application of MAD optimization results in MadNet, a version of the original network, which is equipped with an adversarial defense mechanism. MAD optimization is intuitive, effective and scalable, and the resulting MadNet can improve the original accuracy. We present an extensive empirical study in which we evaluate the MAD optimization on several threat models, including ones where the adversary has no knowledge about defense algorithm being employed (blackbox), and ones where the adversary has complete knowledge of the defense algorithm. MadNet improves state-of-the-art adversarial detection performance under all threat models.