LLMs and Transformers from Scratch: the Decoder

Exploring the Transformer’s Decoder Architecture: Masked Multi-Head Attention, Encoder-Decoder Attention, and Practical Implementation

Luís Roque

Published in

Towards Data Science

13 min readJan 10, 2024

This post was co-authored with Rafael Nardi.

Introduction

In this article, we delve into the decoder component of the transformer architecture, focusing on its differences and similarities with the encoder. The decoder’s unique feature is its loop-like, iterative nature…

LLMs and Transformers from Scratch: the Decoder

Exploring the Transformer’s Decoder Architecture: Masked Multi-Head Attention, Encoder-Decoder Attention, and Practical Implementation

Introduction

Written by Luís Roque