M.Sc Thesis | |

M.Sc Student | Golan Itay |
---|---|

Subject | Effects of Human-Controlled Hyper-Parameters in Deep Neural Networks |

Department | Department of Electrical and Computers Engineering |

Supervisor | ASSOCIATE PROF. Daniel Soudry |

Full Thesis text |

Deep neural networks (DNNs) are a family of machine learning models with broad representation capabilities. Despite their ability to overfit the train set, DNNs training often leads to a solution that generalizes well. This bias towards generalizing solutions is not injected explicitly into the training scheme, but the algorithms have some implicit bias towards solutions with some unknown property, and these solutions generalize well. Among the factors which affect this implicit bias are hyper-parameters. Before training a deep neural network, one needs to determine the values of many hyper-parameters such as weight decay, momentum, or the choice of the loss function. These values have a significant impact on the DNN performance, nevertheless, there is no practical mechanism for finding the optimal values. Moreover, some of these hyper-parameters are not independent, and changing one of them often requires adjusting others as well. We study how different hyper- parameters interact with each other, and how they affect the DNN generalization. We show that when Batch-Normalization is used, weights norm only scales an implicit learning rate which controls the change in weights direction. This finding provides an explanation for weight decay contribution to generalization. We also show that given the implicit learning rate values induced by a training with weight decay, the same performance can be achieved without using weight decay. Another example of the impact of hyper-parameters is demonstrated in the continual learning problem. For two common continual learning scenarios, we suggest a simple adjustment to the loss function which reduces “catastrophic forgetting” significantly.