A Practical Demonstration of Using Vision Transformers in PyTorch: MNIST Handwritten Digit Recognition

Published in

Towards Data Science

5 min readOct 9, 2020

In this article, I will give a hands-on example (with code) of how one can use the popular PyTorch framework to apply the Vision Transformer, which was suggested in the paper “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale” (which I reviewed in another post), to a practical computer vision task.

Written by Stan Kriventsov