GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2

Large language model quantization for affordable fine-tuning and inference on your computer

Benjamin Marie
Towards Data Science
7 min readAug 25, 2023

--

Image by the author — Made with an illustration from Pixabay

As large language models (LLM) got bigger with more and more parameters, new techniques to reduce their memory usage have also been proposed.

--

--

Ph.D, research scientist in NLP/AI. Medium "Top writer" in AI and Technology. Exclusive articles and all my AI notebooks on https://kaitchup.substack.com/