Paper explained: Masked Autoencoders Are Scalable Vision Learners

How reconstructing masked parts of an image can be beneficial

Leon Sick
Towards Data Science
7 min readDec 29, 2021

--

Autoencoders have a history of success for Natural Language Processing tasks. The BERT model started masking word in different parts of a sentence and tried to reconstruct the full sentence by predicting the words to be filled into the blanks. Recent work has aimed to transfer this idea to the computer vision domain.

--

--

PhD Student @ Ulm University | Computer Vision Research for Perception & Data Generation | Support my writing: https://medium.com/@leon.sick/membership