Caveman Skill and the Brevity Research: 65-75% Token Reduction That Improves LLM Accuracy
Caveman is a Claude Code skill that instructs the model to drop articles, filler, hedging, and pleasantries — cutting 65-75% of output tokens with no loss of technical accuracy. The underlying research (Hakim, March 2026) tested 31 models on 1,485 problems and found that on 7.7% of benchmark problems, larger models underperform smaller ones by 28.4 percentage points due to 'spontaneous scale-dependent verbosity' — verbose reasoning paths that degrade accuracy. Forcing brevity reverses this.
Caveman is a Claude Code skill by Julius Brussee that instructs the model to communicate without articles, filler words, hedging phrases, and pleasantries. The result: approximately 65-75% reduction in output tokens across normal software engineering tasks with no loss of technical accuracy. The skill trended #1 on GitHub with roughly 5,000 stars. ## How It Works Three intensity levels: **lite** (drops filler but maintains readability), **full** (telegraphic style, strips all non-essential words), and **ultra** (extreme compression). Example: "I'll now read the configuration file to understand the current settings and then make the necessary changes" becomes "read config, change settings." ## The Research Foundation The more significant finding is a March 2026 arXiv paper — **"Brevity Constraints Reverse Performance Hierarchies in Language Models"** by Hakim (2604.00025) — which tested 31 models from 0.5B to 405B parameters on 1,485 benchmark problems. Key finding: on 7.7% of benchmark problems, **larger models actually underperform smaller ones by 28.4 percentage points**. The cause: "spontaneous scale-dependent verbosity" — larger models generate longer, more elaborate reasoning chains that introduce errors, contradictions, and tangential exploration. The verbose paths are confident-sounding but less accurate than concise ones. When brevity constraints were applied (forcing shorter responses), the performance hierarchy normalized — larger models regained their expected advantage. This suggests that verbosity is not just a cosmetic issue but actively degrades reasoning quality in some contexts. ## Implications For AI coding assistants: shorter outputs are not just faster to read — they may be more accurate. The finding challenges the assumption that longer chain-of-thought reasoning always improves results. In practice, forcing an LLM to be concise eliminates the tangential reasoning paths that introduce errors, while preserving the core analytical steps that produce correct answers.