A simple way to learn generally from a large training set: DINO

This post describes a self-supervised learning method: self-distillation with no Labels (DINO)

Published in

Towards Data Science

9 min readApr 6, 2022

While the method (DINO [1]) itself is simple and straightforward, there are some prerequisites to understanding the method, i.e., 1) supervised learning, 2) self-supervised learning, 3) knowledge distillation, and 4) vision transformer. If you know all of it, you can skip to here.

Supervised Learning

Supervised learning is straightforward. We have a bunch of images, for each image, we have a label. Then, we train a model by telling it which image belongs to which label. In this case, we call it image classification, the learning objective is the cross-entropy loss between the one-hot label and the predicted probability distribution. By taking the index of the maximum value of the probability distribution, we obtain the predicted label of an image.

But the problem of supervised learning is that the model will often reduce the rich visual information contained in an image into a single category selected from a predefined set of a few thousand categories. In other words, the learning signal is only predicting a label.

A simple way to learn generally from a large training set: DINO

This post describes a self-supervised learning method: self-distillation with no Labels (DINO)

Supervised Learning

Self-Supervised Learning

Published in Towards Data Science

Written by KamWoh Ng

No responses yet