Paper explained: Masked Autoencoders Are Scalable Vision Learners

How reconstructing masked parts of an image can be beneficial

Published in

Towards Data Science

7 min readDec 29, 2021

Autoencoders have a history of success for Natural Language Processing tasks. The BERT model started masking word in different parts of a sentence and tried to reconstruct the full sentence by predicting the words to be filled into the blanks. Recent work has aimed to transfer this idea to the computer vision domain.

Paper explained: Masked Autoencoders Are Scalable Vision Learners

How reconstructing masked parts of an image can be beneficial

Written by Leon Sick