A Practical Demonstration of Using Vision Transformers in PyTorch: MNIST Handwritten Digit Recognition

Stan Kriventsov
Towards Data Science
5 min readOct 9, 2020

--

In this article, I will give a hands-on example (with code) of how one can use the popular PyTorch framework to apply the Vision Transformer, which was suggested in the paper “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” (which I reviewed in another post), to a practical computer vision task.

The schematic of the Vision Transformer (from https://openreview.net/pdf?id=YicbFdNTTy)

--

--

Software/ML Engineer at Google. Founder of Deep Learning Reviews: https://www.dl.reviews. Former pro chess and poker player.