Run Llama 2 70B on Your GPU with ExLlamaV2

Finding the optimal mixed-precision quantization for your hardware

Benjamin Marie
Towards Data Science
7 min readSep 29, 2023

--

Image by the author — Made with an illustration from Pixabay

The largest and best model of the Llama 2 family has 70 billion parameters. One fp16 parameter weighs 2 bytes. Loading Llama 2 70B requires 140 GB of memory (70 billion * 2 bytes).

--

--

Ph.D, research scientist in NLP/AI. Medium "Top writer" in AI and Technology. Exclusive articles and all my AI notebooks on https://kaitchup.substack.com/