Run Llama 2 70B on Your GPU with ExLlamaV2

Finding the optimal mixed-precision quantization for your hardware

Published in

Towards Data Science

7 min readSep 29, 2023

Image by the author — Made with an illustration from Pixabay

The largest and best model of the Llama 2 family has 70 billion parameters. One fp16 parameter weighs 2 bytes. Loading Llama 2 70B requires 140 GB of memory (70 billion * 2 bytes).

Run Llama 2 70B on Your GPU with ExLlamaV2

Finding the optimal mixed-precision quantization for your hardware

Written by Benjamin Marie