I killed my own marketplace

I designed a per-skill app store, built it, and deleted it — because a skill has to earn its six cents.

For a stretch of this project, the business model was a marketplace. I had Claude Code build a real one: a per-skill app store where your pal could "install" new abilities — a recipe keeper, a coaching-plan writer, a lesson planner — each its own little product with its own tools, its own playbook, its own page. Skills as the unit of value, a long tail of them, an afternoon of writing to ship each one.

Then I deleted it.

Not because it didn't work — it worked. I deleted it because of a number, and the number is the most useful thing I've learned building an AI product. So this essay is about the number, and about the discipline it forces.

The number

Every message your pal sends costs money. Not metaphorically — a real, countable amount, billed per token, that I can watch tick up in a dashboard. For a product running on a frontier model with a quality bar, that number lands somewhere around a few cents to a dime per message once you amortize everything in (the model, a second model that checks the first one, the context you re-send every turn).

A few cents a message doesn't sound like much until you put it next to the thing every normal person compares you to: the AI chat they already use for free. That's the competitive reality. The product has to be worth a price the user can feel, against a default that appears to cost nothing.

So cost-per-message isn't an accounting detail you optimize at the end. It's a design constraint that sits at the center of every feature decision, the same way latency sits at the center of a database. Once I started treating it that way, things I'd built started looking different.

Earn the round-trip

Here's the test I landed on, and it's the whole essay in one line: a feature that spends an LLM round-trip has to buy something worth a round-trip with it.

A round-trip — sending the conversation to the model and getting an answer back — is the expensive unit. It's where the cents go. So the question for any capability is: what judgment, what synthesis, what thinking does this round-trip purchase? If the answer is "it appends an item to a list," you've spent a dime of frontier-model inference to do an array operation a database does for free.

That's not hypothetical. One of my marketplace skills was a grocery list. The pal could add items, check them off, remove them. People liked it. But every "add milk" routed a full conversational turn through the model — re-priming the context, reasoning about intent, calling a tool — to perform a splice. Roughly six cents to put one word in an array. The pal earns its keep on prose and judgment and remembering the names of the people in your life. It does not earn its keep on list CRUD. So the grocery skill went in the ground.

The calendar, by contrast, stays. When you tell your pal "move the dentist thing to after school on Thursday," the round-trip is doing real work: parsing a fuzzy human reference, finding the right event among several, handling the ambiguity, getting the timezone right. That's judgment. That earns it.

Why the whole marketplace failed the test

Once "earn the round-trip" was the lens, the marketplace stopped making sense as a structure. The pitch for a skill store is volume — a hundred little skills, each a small thing. But "small thing" and "earns a frontier-model round-trip" are in tension. Most of the long tail I imagined was, on inspection, list-CRUD wearing a costume. The skills that genuinely earned their inference (the ones with real judgment in them) weren't a long tail at all — they were a short list of capabilities a good assistant should just have.

So I collapsed it. No marketplace, no per-skill store, no install flow. The pal comes with the capabilities that earn their place — a calendar, a library for the things you make together, web access — and the business model became the honest one: a subscription. Free for a run of messages so you can feel what it is, then you pay to keep talking. The pal never sees the paywall, never sells you anything; the commerce is the platform's job, not the relationship's.

Tearing it out deleted more code than building it did. That's usually the sign you got something right.

The forensics habit

The deeper lesson isn't "kill marketplaces." It's that you cannot reason about agent cost from intuition — you have to measure it, and the measurements are surprising.

I'll give you the one that surprised me most, because it's the kind of thing you'd never guess. I had a feature that added a tool to the pal's toolset conditionally, partway through certain turns. Seemed harmless — a tool definition is small. But model requests render in a fixed order: tools first, then the system prompt, then the conversation. And caching is a prefix match — change anything early and everything after it has to be re-processed at full freight. By injecting a tool mid-stream, I was invalidating the cache on the entire prompt prefix behind it — tens of thousands of tokens — and paying to re-write all of it. A tiny conditional tool was quietly the most expensive thing in the turn. (There's a whole essay in that one; it's the next one.)

You only find things like that by building the instrument. So I built it: a per-turn readout of what the pal did, which tools it called, and what it actually cost — surfaced to me in an admin view, and a gentler version surfaced to the user. That second part turned out to matter more than I expected, which is its own essay too. The short version: showing people the cost of an agent is the fastest way I've found to make them understand what an agent is.

What this discipline is really for

It would be easy to read all this as penny-pinching. It isn't. A product that's structurally expensive — frontier model, a second model checking it, the whole transparency apparatus — has exactly one defense, and it's not being cheap. It's being worth it. The cost discipline isn't there to make the product cheap; it's there to make sure every cent that is spent is buying judgment a user can feel, and not quietly evaporating on an array splice or a cache you re-wrote for no reason.

Kill the features that don't earn their round-trip. Spend lavishly on the ones that do. Measure everything, because the expensive thing is never where you think it is.

And be willing to delete the marketplace you were proud of. The number doesn't care that you were proud of it.