Deep Dive

Why Token Efficiency Matters More Than You Think

2025-04-01 4 min read

Every token costs money and time. Caveman mode removes the fat without losing the muscle.

Every time Claude responds, you pay for tokens. Output tokens — the words Claude writes back to you — are the expensive ones. And most LLMs, by default, pad those responses with an enormous amount of verbal filler.

Think about how many times you've read "I'd be happy to help you with that" or "The reason this is happening is because..." Those phrases feel polite. They feel professional. But to an engineer paying per token, they are pure waste.

The Numbers

In a study of 1,000 Claude Code prompts, typical responses averaged 68 tokens of pure filler per reply — pleasantries, hedging, unnecessary context-setting. Across a team of 10 engineers running 50 prompts a day each, that's 34,000 wasted tokens daily. At scale, that is real money.

Caveman mode eliminates that entirely. Not by summarising answers, not by cutting technical content — by removing only the verbal fat that surrounds the actual answer.

What Changes?

Nothing that matters. Code blocks stay identical. Technical terms are preserved verbatim. Error messages are quoted exactly. The only difference is that Claude stops saying "Sure, let me take a look at that for you" and starts immediately providing the answer.

That is what caveman mode does. One install. Permanent savings.

← Older

Caveman Mode vs Normal Claude: A Real-World Comparison