Dustav.com

Under the hood

Cost & caching

The per-message economics, prompt caching, and why we show you the price.

A note on what this page is: Dustav's job is to keep your household's calendar, notes, and facts straight — that's the product. This page is one level down, for the curious and the builders: how much that costs to run, and what we do about it. It's honest and specific on purpose, because the cost is real and we'd rather show it than bury it.

Why it costs what it costs

Dustav runs a frontier model on every message. That's a deliberate choice — a cheaper model gets the dates wrong, forgets the kids, and confabulates, which is exactly the failure mode a household agent can't have. But it means Dustav is structurally expensive: cents per message, not free the way an ad-supported chatbot is free.

That's also why it's BYOK. The model cost lands on your own Anthropic key, not hidden inside a subscription — so we can run the best model for the job instead of the cheapest one that would protect a margin, and you can see exactly what it costs.

The two levers

On a per-message-billed product, two things drive the bill, and everything in the engineering points at them:

  • Context size. Every token the agent reads on each turn is a token you pay for, every turn. So the context is kept lean — a bounded window of recent history, a tight facts knowledge base that doesn't grow into a diary, only the live state that's relevant. (Context engineering is the whole discipline.)
  • Round-trips. A single message can turn into several model calls — the agent reads a photo, calls a tool, reads the result, calls another. Each hop re-sends the context and bills again. Fewer, tighter hops is cheaper, so the tools are designed to resolve in as few round-trips as the job honestly needs.

Prompt caching

The single biggest lever after those two is prompt caching, and it's worth understanding because it shapes how the context is laid out.

Anthropic can cache a stable prefix of the request — if the same opening chunk of context shows up again within a short window (about a five-minute TTL, refreshed on each hit), the model reads it from cache instead of reprocessing it. The economics: a cache read costs about 0.1× the normal input price, while writing something into the cache costs about 1.25×. So a token you'll reuse is cheap on every turn after the first; a token you won't reuse is slightly more expensive to have cached at all.

That asymmetry dictates the layout:

  • The stable stuff goes in the cached prefix — the constitution, the operating manual, the tool catalog. It's identical turn to turn, so it's written to the cache once and read back at 0.1× for every message after.
  • The volatile stuff is placed where it won't bust the cache — the live calendar state and the latest facts change often, so if they sat inside the cached prefix, every change would invalidate the whole prefix and force a full re-write. Keeping them out of the stable block means one changing event doesn't cost you a re-cache of everything above it.

The cold-reminder exception

There's one place we deliberately don't cache: a reminder that fires after a long idle gap. If the household hasn't talked to Dustav in an hour, the cached prefix has already expired — so re-caching it just to send one reminder would cost the 1.25× write for nothing, since nothing follows to read it back. In that case the reminder is sent uncached on purpose. Paying to re-cache would cost more than it saves.

Cost you can see

The honest version of "we care about cost" is a number in the product. Below the composer, Dustav shows a per-turn price — what that message actually cost on your key. It's not accounting; it's a forcing function. A visible price dares the agent to be worth it, and it keeps us honest about every token we spend on your behalf.