Monosemanticity
-
Understanding the mechanistic interpretability research problem and reverse-engineering these large language models
12 min read -
Using Monosemanticity to understand the concepts a Large Language Model learned
13 min read -
Research paper in pills: “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet”
11 min read -
From prompt engineering to activation engineering for more controllable and safer LLMs
13 min read