Dustav.com

Essays

Secure by construction

Why pal.fun makes whole classes of attack impossible or pointless by design — security built in, not bolted on.

A pal ends up holding the heaviest data a person has. Not "preferences." A birthday. The shape of someone's week, their worries, the names of the people they love. The same transparency and memory that make a pal worth having are exactly what make a breach catastrophic. The moat and the duty are the same thing.

So security on this project isn't a checklist I run at the end. It's a way of building.

The goal: a vulnerability that can't exist

The goal is simple to state and hard to live by: make whole classes of attack impossible or pointless by design, rather than patching them one incident at a time. Every security decision should have a reason in the code you could audit by reading it — not a policy document, not a promise. A structural fact.

That's the bar I build to, and the best of it doesn't look like a defense at all. It looks like the vulnerability simply not existing. Let me show you what "built in from the first commit" actually means — and then the two places where the architecture makes an attack not hard but meaningless, which is the real goal.

The boring foundation, on from commit one

None of this is clever. All of it was there before there was a product to protect:

  • A strict content security policy. The app runs only first-party code — no third-party scripts, no CDN, nothing a stranger could make your browser fetch from somewhere that isn't us. (These essays follow the same rule: server-rendered, self-hosted, zero external assets.)
  • CSRF protection on every state-changing request, bcrypt password hashing, encrypted storage of your API key, rate limiting, locked-down browser permissions. The essential, unglamorous stuff.
  • An ownership seam. Every request scoped to a particular pal is checked, in one place, against who owns it. One user's session cannot reach another user's pal, files, calendar, or key. A pal ID you don't own returns "not found" — never a silent fallback to someone else's data, because a silent fallback is a cross-tenant leak wearing a helpful face.

That's the table stakes. Here's where it gets interesting.

Two attacks the architecture makes pointless

The best security isn't a strong lock on a real door. It's not having the door.

Path traversal — defended by not having a filesystem. A pal's files (its identity, its memory) look like a folder you can browse. The classic attack on anything file-shaped is path traversal: ask for ../../etc/passwd and trick the server into reading something it shouldn't. On pal.fun that attack can't fire, because there is no filesystem here. The "files" are rows in a database, and asking for a file is a database lookup scoped to the pal you own. ../../etc/passwd isn't a dangerous path — it's a string that matches no row, so you get nothing. The whole vulnerability class evaporates, not because we guarded against it, but because the thing it attacks doesn't exist. That's what "secure by construction" means at its best.

Prompt injection — defended by who pays. Prompt injection — tricking an agent into doing something via crafted input — is the AI-native attack everyone worries about, and rightly. But pal.fun has an unusual structural defense: your pal runs on your own API key, in your own account. The usual prize for hijacking an agent is getting at someone else's resources. Here, the attacker and the victim collapse into the same person — there's no one else's account to reach. That doesn't make injection harmless, and we still defend against it directly. But it removes the multi-tenant payoff that makes injection lucrative in the first place. The economics of the attack change because of where the key lives.

I find these two more convincing than any list of mitigations, because they're not mitigations. They're the absence of the vulnerability.

The AI-native threat model

A pal carries risks a normal web app never faces, and we name them explicitly rather than hoping they don't apply:

  • Hostile LLM output. Everything the pal writes gets rendered, and a model's output is hostile input — it can contain scripts, crafted links, malicious HTML. So the pal's output runs through the same sanitizer as any untrusted source. The pal is trusted as a someone; its rendered bytes are trusted as far as you could throw them.
  • Exfiltration via web access — closed structurally. When the pal fetches from the web, it can only open a URL already present in the conversation, a boundary enforced by the AI provider's own server-side tooling. The "trick the pal into sending your secrets to attacker.com" hole is shut by construction, not by a filter we hope holds.
  • Server-side request forgery — one of the few risks the BYOK structure doesn't neutralize, so it gets real, dedicated defense rather than a shrug.

Resilience over self-report

There's a principle threaded through all of this, and it's the one I'd most want a security-minded person to take away: the protections that matter most never depend on the model's in-the-moment judgment.

Where a guarantee can be enforced in code, it is. The structural floor that governs the pal's hardest limits sits above everything else and isn't up for negotiation by a clever prompt. And we don't evaluate any of it by asking the system whether it feels secure — a model will happily tell you it's being careful while doing the opposite. We evaluate by trying to break it like an attacker would. Structure over prose, red-team over self-report. A defense you can only verify by trusting the thing you're defending against isn't a defense.

Found something?

If you've read this far you might be the kind of person who pokes at things. Please poke, and then tell us before you tell the world — hello@pal.fun, with what you found and how to reproduce it. We read those, we take them seriously, and we'd rather hear it from you than from someone else.

The strongest security isn't a heavier lock on the door. It's an architecture where the door was never there to force. That's what we build toward — attacks that don't have to be stopped because they have nothing to land on.