Self Attention and Transformers

From Attention to Self Attention to Transformers

Mahendran Venkatachalam
Towards Data Science
7 min readJul 11, 2019

--

This is really a continuation of an earlier post on “Introduction to Attention”, where we saw some of the key challenges that were addressed by the attention architecture introduced there (and referred in Fig 1 below).

--

--