Author: Benjamin Marie
-
Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU
9 min read -
One model, two adapters
9 min read -
But it will depend on your GPU
5 min read -
Fixing Faulty Gradient Accumulation: Understanding the Issue and Its Resolution
Artificial IntelligenceYears of suboptimal model training?
11 min read -
Finding the right trade-off between memory efficiency, accuracy, and speed
7 min read -
Understanding how much memory you need to serve a VLM
8 min read -
Fast and accurate GGUF models for your CPU
8 min read -
How pruning, knowledge distillation, and 4-bit quantization can make advanced AI models more accessible and…
10 min read -
What you can do with only 2×24 GB GPUs and a lot of CPU RAM
9 min read