A Visual Guide to Mamba and State Space Models
An alternative to Transformers for language modeling
Published in
21 min readFeb 22, 2024
The Transformer architecture has been a major component in the success of Large Language Models (LLMs). It has been used for nearly all LLMs that are being used today, from open-source models like Mistral to closed-source models like ChatGPT.
To further improve LLMs, new architectures are developed that might even outperform the Transformer architecture. One of these methods is Mamba, a State Space Model.