M.Sc Thesis


M.Sc StudentZheltonozhskii Evgenii
SubjectReducing Supervision in Visual Recognition Tasks
DepartmentDepartment of Computer Science
Supervisors PROF. Avi Mendelson
PROF. Alexander Bronstein


Abstract

While deep neural networks (DNNs) have shown tremendous success across various computer vision tasks, including image classification, object detection, and semantic segmentation, requirements for a large number of high-quality labels obstruct the adoption of DNNs in real-life problems. Lately, researchers have proposed multiple approaches for reducing requirements to the amount or quality of these labels or even working in a fully unsupervised way.

In a series of works, we study different approaches to supervision reduction in visual recognition tasks: self-supervised learning, learning with noisy labels, and semi-supervised learning. For self-supervised learning, we show that dimensionality reduction followed by simple k-means clustering is a very strong baseline for fully unsupervised large-scale classification (ImageNet). We find cases in which clustering performance does not correlate with other evaluation approaches to self-supervised learning. Additionally, we present a learning with noisy labels framework comprising two stages: self-supervised pre-training and robust fine-tuning. The framework, dubbed “Contrast to Divide” (C2D), significantly outperforms prior art on synthetic (CIFAR-10 and CIFAR-100) and real-life (WebVision) noise, showing state-of-the-art performance with different methods of learning with noisy labels (DivideMix, ELR) and pre-training approaches (SimCLR, BYOL). Furthermore, since self-supervised pre-training is unaffected by label noise, C2D is especially efficient in a high noise regime.