|M.Sc Student||Zheltonozhskii Evgenii|
|Subject||Reducing Supervision in Visual Recognition Tasks|
|Department||Department of Computer Science||Supervisors||PROF. Avi Mendelson|
|PROF. Alexander Bronstein|
While deep neural networks (DNNs) have shown tremendous success across various computer vision tasks, including image classification, object detection, and semantic segmentation, requirements for a large number of high-quality labels obstruct the adoption of DNNs in real-life problems. Lately, researchers have proposed multiple approaches for reducing requirements to the amount or quality of these labels or even working in a fully unsupervised way.
In a series of works, we study different approaches to supervision reduction in visual recognition tasks: self-supervised learning, learning with noisy labels, and semi-supervised learning. For self-supervised learning, we show that dimensionality reduction followed by simple k-means clustering is a very strong baseline for fully unsupervised large-scale classification (ImageNet). We find cases in which clustering performance does not correlate with other evaluation approaches to self-supervised learning. Additionally, we present a learning with noisy labels framework comprising two stages: self-supervised pre-training and robust fine-tuning. The framework, dubbed “Contrast to Divide” (C2D), significantly outperforms prior art on synthetic (CIFAR-10 and CIFAR-100) and real-life (WebVision) noise, showing state-of-the-art performance with different methods of learning with noisy labels (DivideMix, ELR) and pre-training approaches (SimCLR, BYOL). Furthermore, since self-supervised pre-training is unaffected by label noise, C2D is especially efficient in a high noise regime.