5:57 PM

Industry

AI Agents Need to Run Code. That's Harder Than It Sounds.

Everyone's building AI agents. Most of the conversation is about which model to use, how to prompt it, how to wire up tool calls. Almost nobody is talking about the part that actually matters for production: where does the code run?

An agent that only chats is a demo. An agent that executes code — writes a function, calls an API, processes data, returns a result — is a product. But the moment you let an AI generate and execute code on the fly, you've got a security problem. You can't just eval() whatever Claude spits out and hope for the best.

You need a sandbox. And until recently, your options weren't great.

The container problem

The default answer has been containers. Spin up a Linux container, execute the code inside it, tear it down. It works, but it's slow and expensive. Hundreds of milliseconds to boot. Hundreds of megabytes of memory per instance. You end up keeping containers warm to avoid latency, reusing them across tasks to save money, and quietly compromising the security isolation that was the whole point.

For a developer tool with a few hundred users, containers are fine. For consumer-scale agents — where every end user has an agent, and every agent is writing and running code — they fall apart.

What Cloudflare just shipped

Cloudflare's Dynamic Worker Loader, now in open beta, takes a different approach. Instead of containers, it uses V8 isolates — the same sandboxing mechanism that powers Google Chrome. An isolate starts in milliseconds, uses a few megabytes of memory, and runs on the same thread as the worker that created it. No cold starts. No warm pools. No global concurrency limits.

The numbers are stark: roughly 100x faster startup and up to 100x more memory efficient than containers. At that cost, you can spin up a fresh sandbox for every single request and throw it away when it's done. That's real isolation at real scale.

TypeScript as the interface layer

The smartest part of the design isn't the performance. It's how they handle the API surface.

When an agent needs to call external services from inside its sandbox, most platforms reach for OpenAPI specs — verbose, token-heavy, and painful to work with. Cloudflare uses TypeScript interfaces instead. A chat room API that takes 40+ lines of OpenAPI YAML can be expressed in about 10 lines of TypeScript. The agent already knows TypeScript. The interface is concise. The type safety is built in.

This matters because every token you spend describing your API is a token you're not spending on the actual task. For agent orchestration at scale, the interface layer's token efficiency is a real cost driver.

Why this matters beyond infrastructure

If you're building products with AI agents, this is the kind of shift that changes your architecture. A few things stand out:

The execution layer is becoming the differentiator. The models are commoditising. Claude, Gemini, Grok — they all write competent code. The question isn't which model writes the best function. It's whether your system can execute that function safely, cheaply, and fast enough that your users never notice it happened.

Sandboxing is a form of constraint. This connects to a broader principle in agent orchestration: agents perform better when they're given tight boundaries. A sandbox that blocks network access by default, injects credentials on the way out, and enforces strict output schemas isn't just a security measure. It's a design pattern that makes the agent more reliable.

Code Mode is replacing tool calls. Cloudflare's approach — having the agent write a single function that chains multiple API calls together, rather than making sequential tool calls — cuts both latency and token usage. Only the final result hits the context window, not every intermediate step. This is a better architecture for anything beyond simple single-tool tasks.

The boring layer

None of this is glamorous. Sandboxing, isolate management, credential injection, TypeScript interface design — it's infrastructure. But it's the infrastructure that determines whether your agent demo becomes an agent product.

The industry's attention is on the models. The real engineering is in the execution layer.

Let's build something.

I'm always up for a conversation with founders and teams who want to ship faster.