Self Attention and Transformers

From Attention to Self Attention to Transformers

Published in

Towards Data Science

7 min readJul 11, 2019

This is really a continuation of an earlier post on “Introduction to Attention”, where we saw some of the key challenges that were addressed by the attention architecture introduced there (and referred in Fig 1 below).

Self Attention and Transformers

From Attention to Self Attention to Transformers

Written by Mahendran Venkatachalam