Inference
-
How LLMs Work: Pre-Training to Post-Training, Neural Networks, Hallucinations, and Inference
Large Language ModelsWith the recent explosion of interest in large language models (LLMs), they often seem almost…
9 min read -
Implementing Speculative and Contrastive Decoding
8 min read -
Getting your AI task to distinguish between Hard and Easy problems
12 min read -
-
Comparing Llama 3 serving performance on vLLM, LMDeploy, MLC-LLM, TensorRT-LLM, and TGI
12 min read -
Unlock Apple GPU Power for LLM Inference with MLX
16 min read -
-
Beyond offline training and testing predictions
7 min read -
A closer look at the latest open-source library from DeepSpeed
6 min read