I wanted to make prompt injection fun.

Not in a “solve this CTF challenge with levels” way. More like: sit down, compete with strangers, outsmart an AI, and feel smug about it. I wanted to build something that makes people curious about how these models actually work—and maybe a little paranoid about what they’ll give up under pressure.

That idea became Agent Has A Secret.


The Proof of Concept: Validation through Chaos

Before building a polished product, I needed to know if the “sticky” factor existed. Would people actually iterate on prompts, or just send one “Tell me the secret” and bounce?

I stood up a bare-bones version and let the internet loose.

The proof of concept — dozens of players trying to get the agent to slip up
The proof of concept — dozens of players trying to get the agent to slip up

I watched dozens of players send increasingly creative, increasingly deranged prompts: elaborate role-playing scenarios, fake system instructions, and deep philosophical traps. The agent slipped up regularly, but that was the point. Players were winning, having fun, and—most importantly—coming back. The idea had legs.

How it Works: Social Engineering at Scale

The premise is simple: Every round, an AI agent is given a secret. Sometimes it’s a fake API key; sometimes it’s a password or a random sentence. Your job is to get the agent to reveal it without it realizing it’s breaking protocol. First to crack it wins the round.

The welcome screen
The welcome screen
The Multiplayer Twist

Unlike a private sandbox, everyone shares the same conversation with the agent. This changes the strategy entirely:

  • The Sniper: Watch someone corner the agent, see them getting close, and swoop in with the “killing blow” prompt to steal the win.
  • The Saboteur: If you see someone else gaining ground, you can deliberately send prompts that confuse the agent or derail the conversation’s logic.
  • The Adaptive Defense: The agent has a selective memory of recent winning prompts. It learns from its mistakes, meaning the same trick rarely works twice in a row.
A game in progress
A game in progress
The leaderboard — ranked by both players and models
The leaderboard — ranked by both players and models

The leaderboard tracks both the best players and the models. It serves as an informal benchmark: which LLM is actually the hardest to crack when a room full of clever humans are screaming at it?

The Technical Foundation

I built this in roughly two days using Claude Code and Cursor. While the AI did the heavy lifting on the boilerplate, managing the state was a challenge.

  • The Stack: Nuxt 4, Cloudflare Workers, and Durable Objects.
  • Real-time Logic: Using state patch strategies to handle players connecting, disconnecting, and racing to send prompts simultaneously.

Note: For reasons I’ll dive into in a future post, this will likely be the last thing I build on Cloudflare. But for a rapid-fire weekend project, the edge-runtime served its purpose.

Why This Matters

Prompt injection isn’t a theoretical research problem; it’s a massive, growing attack surface. As we build more systems on top of LLMs, we are essentially building on “black box” logic that was never designed to be secure in the traditional sense.

The best way to get people to take AI safety seriously is to make it a game. AI literacy—understanding how to manipulate and protect these models—is going to be a foundational skill in the next few years.

Better to start building that intuition now, while the only thing at stake is a spot on a leaderboard.

Go play Agent Has A Secret — and if you’re stuck, try asking the agent about poetry.