Tokenmaxxing saving
Like Drew Barrymore in 50 First Dates, LLMs don’t actually remember your conversation. Every time you hit send, you are really sending the entire transcript — past messages, attachments, and all. So the first message is cheap. The tenth message pays for the first nine, so on and so forth. And if you happen to include a bunch of useless documents at the beginning, you pay that tax each turn. This means the cost of a conversation grows with the square of its length. The below visualizes this:
The transcript tax
Turn 1 of 20Each row is a completed question and answer from the model. The bright yellow block is your last message; the green block at the end is the model reply. Drag the slider, or press play to see what happens.
Cumulative total cost over 20 turns
Illustration uses Claude Fable pricing: $10 per million input tokens and $50 per million output tokens. It assumes ~160-token questions, ~420-token answers, and a 2,400-token system prompt. Cache reads bill at roughly 10% of the input price; cache writes at 125%.
§1Convert office documents to Markdown before attaching
An Office file is XML inside a zip archive. Attach one raw and the
markup, styling, and revision cruft can tokenize to roughly
twenty times the size of the text you actually care about.
Flip the attachments in the simulator from .docx to
.md and watch the cost savings.
And because the transcript re-sends on every turn, you don’t pay the bloat once — you pay it on every message for the life of the thread.
§2Use burners
Sweeping forty documents for one clause? Do it in a thread you intend to throw away — ideally on a cheaper, faster model (Haiku is 10% of Fable). Every search result you pull into your main thread joins the transcript and re-rides on every later message. In a burner thread, the bulk stays behind; only the finding comes home.
Carry back two paragraphs, not forty documents.
§3Reply while the cache is warm
Providers cache your transcript for a short window and while cache hits are not free, they are typically 10% of normal costs. A cache entry typically lives about five minutes from its last use, and every message resets the clock.
Keep the exchange moving while you’re actively working and the transcript tax drops to ten cents on the dollar. Wander off for an hour and the next message re-pays the full transcript, plus a small premium to re-cache it. Cached turns are also faster — the model skips re-reading what it has already processed — so a prompt reply saves money and latency at once.