Self Attention and Transformers
From Attention to Self Attention to Transformers
Published in
7 min readJul 11, 2019
This is really a continuation of an earlier post on “Introduction to Attention”, where we saw some of the key challenges that were addressed by the attention architecture introduced there (and referred in Fig 1 below).