Token Conservation — caption.fyi

Token~~maxxing~~ saving

Like Drew Barrymore in 50 First Dates, LLMs don’t actually remember your conversation. Every time you hit send, you are really sending the entire transcript — past messages, attachments, and all. So the first message is cheap. The tenth message pays for the first nine, so on and so forth. And if you happen to include a bunch of useless documents at the beginning, you pay that tax each turn. This means the cost of a conversation grows with the square of its length. The below visualizes this:

The transcript tax

Turn 1 of 20

Each row is a completed question and answer from the model. The bright yellow block is your last message; the green block at the end is the model reply. Drag the slider, or press play to see what happens.

Attachment None .docx 58k 5× .docx 290k .md 2.9k

Prompt caching

Total input billed 2,560tok

Total output billed 420tok

Cost $0.05

Cumulative total cost over 20 turns

Illustration uses Claude Fable pricing: $10 per million input tokens and $50 per million output tokens. It assumes ~160-token questions, ~420-token answers, and a 2,400-token system prompt. Cache reads bill at roughly 10% of the input price; cache writes at 125%.

§1Convert office documents to Markdown before attaching

An Office file is XML inside a zip archive. Attach one raw and the markup, styling, and revision cruft can tokenize to roughly twenty times the size of the text you actually care about. Flip the attachments in the simulator from .docx to .md and watch the cost savings.

And because the transcript re-sends on every turn, you don’t pay the bloat once — you pay it on every message for the life of the thread.

§2Use burners

Sweeping forty documents for one clause? Do it in a thread you intend to throw away — ideally on a cheaper, faster model (Haiku is 10% of Fable). Every search result you pull into your main thread joins the transcript and re-rides on every later message. In a burner thread, the bulk stays behind; only the finding comes home.

Searching from your main thread

Forty documents’ worth of results ride on every message, forever.

Burner thread + summary

The sweep happens off to the side; two paragraphs come home. The main thread stays light.

Carry back two paragraphs, not forty documents.

§3Reply while the cache is warm

Providers cache your transcript for a short window and while cache hits are not free, they are typically 10% of normal costs. A cache entry typically lives about five minutes from its last use, and every message resets the clock.

Keep the exchange moving while you’re actively working and the transcript tax drops to ten cents on the dollar. Wander off for an hour and the next message re-pays the full transcript, plus a small premium to re-cache it. Cached turns are also faster — the model skips re-reading what it has already processed — so a prompt reply saves money and latency at once.