|Ph.D Student||Hoffer Elad|
|Subject||Deep Learning: Rethinking Common Practices|
|Department||Department of Electrical Engineering||Supervisors||Dr. Daniel Soudry|
|Professor Nir Ailon|
|Full Thesis text|
Deep learning has revolutionized the way machine learning is used in various domains, improving previous approaches substantially in many cases such as computer vision, speech recognition, and control. Deep models aim at learning hierarchical representations of data, substituting engineered features by neural networks structures trained end-to-end.
We will examine and shed new light on several common practices and beliefs in Deep Learning:
• Learning explicit versus implicit features ? we will suggest a network and objective function capable at learning features explicitly by metric embedding using deep networks. This model will also be shown useful at unsupervised and semi-supervised settings.
• The effect of batch size on generalization ? showing that contrary to previous beliefs, the batch-size used for training deep models has no inherent negative impact on their generalization capabilities. This investigation will also provide us with insights on several key properties of these models and their training.
• The purpose of the classifier in the last layer ? demonstrating that a fully-connected classification layer which is prevalent in deep models, has a marginal impact on its performance. Instead, using a fixed classifier can be used to improve performance and reduce parameter redundancy of the model.
• The role of batch-normalization ? we will show that batch-normalization, a common building block of deep models, can be interpreted differently than original thought, with implication to other regularization mechanisms. Our reinterpretation will also suggest new alternatives to batch-norm with several computational advantages.
Both theoretical and empirical arguments will be used to show that several commonly used practices are often misguided and how they can be improved.