טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentGeifman Yonatan
SubjectUncertainty Estimation and its Applications in Deep Neural
Networks
DepartmentDepartment of Computer Science
Supervisor Professor Ran El-Yaniv


Abstract

Deep neural networks (DNNs) have recently shown great success in many machine learning domains and problems. Applying these models for mission critical tasks (e.g., medical analysis, autonomous driving), however, still poses many safety challenges due to prediction uncertainty and prediction errors. We begin this dissertation with a discussion of the problem of uncertainty estimation. We propose a simple and effective algorithm to enhance any existing uncertainty estimation method for DNNs. Our algorithm is based on an observation on the learning dynamics of DNNs along training. These dynamics sometimes forces networks that are optimized for classification to wrongly estimate the uncertainty of points with high confidence.  Based on a principle that is similar to early stopping, we reduce this adverse effect by extracting uncertainty estimates from snapshots of the network saved during training before convergence. In this way we are able to enhance all existing uncertainty estimation methods.

We then discuss several applications for uncertainty. First, we review selective prediction (also known as prediction with a reject option), where we propose a tight generalization bound that guarantees the risk over the covered (non rejected) part of the domain. We construct a selective classifier for a pretrained network where we only focus on deriving the selection function. Later on, we propose SelectiveNet, a deep learning architecture that jointly learns both the predictor (classifier or regressor) and the adjoint selection function. We propose a constrained optimization technique to optimize the risk of the non rejected instances with a constraint over the rejection rate. The rejection vs. accuracy tradeoff exhibited by SelectiveNet marks the state-of-the-art.

Another striking application of uncertainty is active learning, where it is common to query points to be labeled based on their uncertainty (a.k.a. uncertainty sampling strategy). One of the problems of active learning for deep networks is the architecture and hyperparameter selection. First, we propose the ``long tail'' setting where active learning is performed after passively querying sufficient labels for architecture and hyperparameter selection. Then, we tackle the problem of architecture selection along the active learning process. We show that by incorporating a neural architecture search algorithm during active learning improves the uncertainty estimates and boosts the overall active learning performance. We propose a novel neural architecture search technique that incrementally expands the architectural capacity in accordance with the number of available labeled points. We show how smaller architectures used during early stages of the active learning process reduce overfiting and enhance the quality of the query function used to select the next points to be labeled.