Run Llama 2 70B on Your GPU with ExLlamaV2
Finding the optimal mixed-precision quantization for your hardware
Published in
7 min readSep 29, 2023
The largest and best model of the Llama 2 family has 70 billion parameters. One fp16 parameter weighs 2 bytes. Loading Llama 2 70B requires 140 GB of memory (70 billion * 2 bytes).