A Visual Guide to Mamba and State Space Models

An alternative to Transformers for language modeling

Maarten Grootendorst
Towards Data Science
21 min readFeb 22, 2024

--

The Transformer architecture has been a major component in the success of Large Language Models (LLMs). It has been used for nearly all LLMs that are being used today, from open-source models like Mistral to closed-source models like ChatGPT.

To further improve LLMs, new architectures are developed that might even outperform the Transformer architecture. One of these methods is Mamba, a State Space Model.

--

--