Ph.D Thesis | |

Ph.D Student | Golts Alona |
---|---|

Subject | Deep Energy: Task Driven Training of Deep Neural Networks |

Department | Department of Computer Science |

Supervisor | PROF. Michael Elad |

Full Thesis text |

Deep learning has revolutionized the field of image processing and computer vision. The key concept is collecting thousands of corresponding images and labels, and via supervised learning, the deep neural network (DNN) can fulfill ones' wildest dreams. In many applications, however, corresponding ground truths are expensive to collect, or altogether unavailable. In parallel, rich knowledge and expertise has been acquired over the years, encompassing optimization, variational methods, sparse representations, probabilistic models, and many more. This work attempts fusing the `old' with the `new' by introducing energy-based loss functions and adaptive neural network architectures, to create more powerful tools to deal with challenging image-related applications.

In energy-based training, we utilize well-studied energy functions and reformulate them as losses for DNN training. Instead of using generic supervised loss terms, requiring external labels, we minimize a task-specific energy function, depending only on the input image and network output. Over time, the network prediction strives towards the optimal solution of the energy function, without ever invoking its iterative solver. Once training is concluded, the network has learned a universal solution to the given energy, and can evaluate a new image via a fast forward pass operation. We demonstrate this approach on four applications which share a common difficulty in collecting reliable pairs of image and ground truth data. Our network is not only more efficient, but more accurate as opposed to the ideal solution of the energy function. This suggests an added regularization, stemming from the approximation of the energy using a DNN.

In the previous suggested scheme, the architecture of the network was arbitrary, but it can also be devised in a principled way. The seminal work of LISTA suggests building a cascaded network architecture where each RNN-like unit is a single iteration of ISTA, an $L_1$ solver of the sparse coding optimization problem. As opposed to ISTA, the matrices are free to be learned over an input distribution of data, allowing a substantial decrease in inference time. While LISTA is extremely efficient, it is limited to a single dictionary/model. We propose reformulating the ISTA iteration such that both the signal and the dictionary are considered as input, and additional matrices are free to be learned, allowing the overall scheme to be adaptive to varying dictionaries. We show both theoretically and experimentally that our method maintains the efficiency of LISTA, while dealing with permuted, noisy and completely random dictionaries, for most of which, LISTA naturally fails.