LLMs and Transformers from Scratch: the Decoder

Exploring the Transformer’s Decoder Architecture: Masked Multi-Head Attention, Encoder-Decoder Attention, and Practical Implementation

Luís Roque
Towards Data Science
13 min readJan 10, 2024

--

This post was co-authored with Rafael Nardi.

Introduction

In this article, we delve into the decoder component of the transformer architecture, focusing on its differences and similarities with the encoder. The decoder’s unique feature is its loop-like, iterative nature…

--

--

Founder & Partner @ ZAAI | VP Data & AI @ Marley Spoon | AI Advisor @ CableLabs | Ph.D. Researcher AI @ LIACC | Cofounder & ex-CEO @ HUUB (acquired by Maersk)