Pierre Lienhart – Medium

Pierre Lienhart

Pierre Lienhart
in
Towards Data Science

The AQLM Quantization Algorithm, Explained

In this blog post, we cover the AQLM quantization algorithm which sets a new state-of-the-art for compressing LLMs down to 2 bits!

13 min readMar 13, 2024

--

2

The AQLM Quantization Algorithm, Explained

--

2

Pierre Lienhart

LLM Inference Series: 5. Dissecting model performance

In this post, we look deeper into the different types of bottleneck that affect model latency and explain what arithmetic intensity is.

14 min readFeb 2, 2024

--

2

LLM Inference Series: 5. Dissecting model performance

--

2

Pierre Lienhart

LLM Inference Series: 4. KV caching, a deeper look

In this post, we will look at how big the KV cache, a common optimization for LLM inference, can grow and at common mitigation strategies.

18 min readJan 15, 2024

--

8

LLM Inference Series: 4. KV caching, a deeper look

--

8

Pierre Lienhart

LLM Inference Series: 3. KV caching unveiled

In this post we introduce the KV caching optimization for LLM inference, where does it come from and what does it change.

11 min readDec 22, 2023

--

7

LLM Inference Series: 3. KV caching unveiled

--

7

Pierre Lienhart

LLM Inference Series: 2. The two-phase process behind LLMs’ responses

After a quick reminder on the Transformer architecture, this post covers the algorithm of text generation using Transformer decoder models.

4 min readDec 22, 2023

--

LLM Inference Series: 2. The two-phase process behind LLMs’ responses

--

Pierre Lienhart

LLM Inference Series: 1. Introduction

In this post, I introduce the outline of this deep dive series about the specifics and challenges of hosting LLMs for inference.

3 min readDec 22, 2023

--

1

--

1

Pierre Lienhart

Pierre Lienhart

GenAI solution architect @AWS - Opinions and errors are my own.

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams