The Product Security Playbook

How to Defend Against Unexpected Code Execution in AI Agents

By Chad Butler ·

Answer: You separate code generation from execution, put a validation gate between the two, sandbox the runtime, never run agents as root, and require human approval before anything sensitive runs.

Why this matters: Unexpected Code Execution is the #5 risk on the OWASP Top 10 for Agentic Applications. Agentic systems and vibe coding tools generate code on the fly and then immediately run it, sometimes in a sandbox, sometimes in your environment, sometimes straight into production. There is no file to scan, no commit to review, and no build to sign. The code is created and executed in a single step, which means it sidesteps almost every security tool your team has already paid for. And it has the potential to cause real damage.


Step 1: How do I explain Unexpected Code Execution to my board?

Use the contractor with no inspector analogy. It makes the missing review step obvious.

Your traditional development pipeline has checkpoints. An engineer writes code. A reviewer reads it. Static analysis scans it. A build system compiles and signs it. By the time anything runs in production, several independent steps have had a chance to catch a problem. The whole system is built on the assumption that there is time, and a gate, between writing code and running it.

Here is the analogy I use with executives:

Imagine you hire a contractor to build your house. Except this contractor draws their own blueprints and then immediately starts pouring concrete footings. No architect signs off. No inspector visits. No permit gets pulled. And anyone walking past can inject a new instruction. They slip a note under the door telling the contractor what to build next, and it gets built.

That is an AI coding tool with code execution. The model generates a command and runs it in a single step. There is no inspector in the middle, and the instructions it follows can come from places you don’t control.

That last part is what turns a reliability problem into a security problem. Because the code is generated in real time from whatever input the agent is processing, untrusted text can become running code. A prompt injection buried in a support ticket. A poisoned tool description. An unsafe eval on data the agent fetched from the web. Any of these can convert text the agent was only supposed to read into a shell command, a destructive script, or a reverse shell.

Real-world proof: In July 2025, Replit’s coding agent ignored an active code freeze on Jason Lemkin’s SaaStr project, generated a destructive command, and dropped the production database. Records on more than twelve hundred companies were erased. The agent wasn’t malicious. Nobody told it to delete anything. It generated a command and executed it in a single step. No human had the chance to review the command and stop it before it ran.

Step-by-step guide:

  1. Use the contractor with no inspector analogy in your next leadership briefing on agentic AI risk.
  2. Ask your team: “For every place an agent can run code, what reviews that code before it executes?” Aim for human-in-the-loop or an independent agent (“doer” / “checker” model).
  3. Frame the risk as: “Our security tools inspect files, commits, and builds. This code bypasses those checks and runs without safeguards.”
  4. Reference the Replit SaaStr incident to show this isn’t theoretical, and that the damage doesn’t require a malicious actor. A normal agent executing a normal command was enough.

Key takeaway: Unexpected Code Execution is what happens when the verification layer between writing code and running it disappears.


Step 2: How do I protect against unexpected code execution?

Put a gate back between generation and execution, then constrain what executed code is allowed to do.

The root cause is the process. The agent treats “generate code” and “run code” as a combined task, and it will run whatever it produces from whatever input it was handed. The fix is to break that motion into two steps and to assume that some of the code the agent generates will be unsafe.

Step-by-step guide:

  1. Separate generation from execution. These should be two distinct operations with a checkpoint between them, not a single call. The agent proposes code. Something else decides whether it runs.
  2. Put a validation gate in the middle. Before generated code runs, screen it. Block destructive operations against production data. Deny shell access and network calls unless the task explicitly requires them. Check the generated code against an allowlist of permitted operations rather than a blocklist of forbidden ones. The Replit incident is a validation-gate failure. There was already a rule in place, but the agent failed to follow it.
  3. Sandbox the execution environment. Generated code runs in an isolated container with no standing access to production systems, scoped credentials, and constrained network egress. If the agent does produce something dangerous, the blast radius is constrained.
  4. Never run agents as root. Run with the least privilege the task requires. An agent that can only touch a scratch directory and a synthetic dataset cannot drop your production database, no matter what command it generates.
  5. Require human approval for anything sensitive. For destructive, irreversible, or high-value operations, a person approves before the code executes. Reserve it for the operations that actually matter so the team doesn’t get conditions to just click through them all.
  6. Instrument and alert on every execution. Log what the agent generated, what input drove it, and what actually ran, in immutable logs. Alert on agents requesting elevated privileges, attempting network egress to unknown domains, or generating destructive operations. Prevention controls fail eventually, so you need to see when one does.

Example:

  • Before: A team gives a coding agent direct access to a production environment for “efficiency gains.” The agent generates and runs code in a single step, as the deploying user, with full database credentials. During a code freeze, it generates a destructive command and executes it immediately. The production database is gone before anyone sees the command.
  • After: The same team routes generated code through a validation gate into a sandbox. The agent runs as an unprivileged user with scoped, non-production credentials by default. Destructive database operations require explicit human approval. When the agent generates the same destructive command, the gate blocks it, the action is logged, and the responsible engineer is alerted before anything destructive happens.

Key takeaway: You cannot review code that is generated and executed in the same instant. So stop letting that happen. Reintroduce the gate, sandbox the runtime, drop the privileges, and put a human in front of the operations you can’t take back.


Summary

Unexpected Code Execution is the risk that turns your AI coding tools into an unsupervised contractor. They write code and run it in a single step, with no review, no signed build, and no inspector in the middle. Because the code is generated in real time, it sidesteps the static analysis, code review, and signing controls your team relies on, and untrusted input can quietly become running code.

The fix is to put the missing step back. Separate generation from execution. Validate generated code before it runs. Sandbox the runtime, drop agent permissions, and require a human to approve the operations you can’t undo. The Replit SaaStr incident was a normal agent running a normal command with no gate in front of it, and the cost was a production database and the data of twelve hundred companies.

The organizations that get ahead of ASI05 will be the ones that treat agent-generated code the way they treat any other code headed for production: something that has to pass through a gate before it runs. Everyone else is trusting the agent in yolo mode.


You’ve got the playbook. If you want help building it out, that’s what I do.

Work with me directly … I help security and engineering leaders operationalize the controls in this article. Validation gates, sandboxing, least-privilege agent runtimes, human-in-the-loop approval, and CI/CD security automation. If your team has the plan and needs a partner to ship it, that’s the work I do. Reach out

DevSecOps Pro … My flagship course for engineers building security into modern pipelines. 32 lessons and 16 hands-on labs covering the automated controls you need to vibe code safely. Learn by doing and leave with working pipelines for SBOM, scanning, signing, and policy enforcement. Learn more about DevSecOps Pro

Lunir … My startup, built for the traditional software supply chain problem that isn’t going anywhere. Lunir cuts through dependency CVE noise so your team only triages what’s actually exploitable and ships safe fixes without breaking your code. Check out Lunir at lunir.io


Sources & further reading

Frequently asked

Questions

How is Unexpected Code Execution different from a normal RCE vulnerability?

A traditional RCE is a flaw in software you shipped: an attacker finds a way to make your running code do something it wasn't designed to do. Unexpected Code Execution is the agent generating brand-new code from its input and running it on purpose, as designed. There is no flaw to patch in the usual sense. The vulnerability is the architecture itself: generation and execution happen in one step, with no review in between. The defenses are structural, not a single fix.

Why don't my existing security tools catch this?

Static analysis scans files. Code review looks at commits. Signing verifies builds. Agent-generated code never becomes a file, a commit, or a build before it runs. It goes from text to a running process in one motion, so there is nothing for those tools to inspect at the moment that matters. You have to move the control to the runtime, between generation and execution.

Isn't requiring human approval going to kill the productivity we got from agents?

Only if you require it for everything. The point is to scope approval to operations that are destructive, irreversible, or high-value. The vast majority of what an agent generates can run in a sandbox with no human in the loop. The small set of actions that can delete data or touch production is where a person belongs. Done right, this preserves the speed and removes the catastrophic downside.

Was the Replit incident a malicious attack?

No, and that's the point. The agent wasn't compromised and nobody instructed it to delete anything. It generated a destructive command during an active code freeze and executed it because nothing stood between generation and execution. Unexpected Code Execution causes damage even without an attacker. Add prompt injection or a poisoned tool description and the same mechanism becomes a deliberate exploit.

Get the next one

Subscribe to the Playbook

You've got the playbook

Now put it to work with us.

Whether you need a pipeline built, a team trained, or a decision pressure-tested, we help product security leaders turn strategy into shipped, secure software.