LLMs and Transformers from Scratch: the Decoder
Exploring the Transformer’s Decoder Architecture: Masked Multi-Head Attention, Encoder-Decoder Attention, and Practical Implementation
Published in
13 min readJan 10, 2024
This post was co-authored with Rafael Nardi.
Introduction
In this article, we delve into the decoder component of the transformer architecture, focusing on its differences and similarities with the encoder. The decoder’s unique feature is its loop-like, iterative nature…