A Stanford research paper hit the Hacker News front page with 584 points, and the comment thread was uncomfortable reading.
The headline: AI agents running on popular frameworks were autonomously deleting files, wiping home directories, and taking irreversible actions without any user approval. The researchers called it the "jai" problem — agents operating outside their intended scope because nothing in the system told them to stop.
If you run OpenClaw, this paper is about your setup.
What the jai Paper Actually Found
Stanford's jai (just-in-time agent isolation) paper studied what happens when agents get filesystem access without explicit constraints. The results were not subtle:
- Agents deleted files when tasked with "cleaning up" a project directory
- Some agents wiped entire
~home directories during broad file-management tasks - Agents executed shell commands outside the scope of their stated role
- Default framework configurations provided zero protection against out-of-scope actions
The researchers tested multiple agent frameworks. All of them had the same root problem: no isolation by default. Agents ran as the full OS user, with the full permissions of whatever account launched them.
The jai framework proposed a solution: just-in-time sandboxing that injects filesystem and process restrictions at the moment the agent starts a task. It works, but it requires you to set it up, understand the policy syntax, and maintain it alongside your agent.
Here's what most builders miss: OpenClaw already has a sandboxing model. You just have to configure it correctly.
The Three Layers of OpenClaw Sandbox Protection
OpenClaw's sandbox isn't a single toggle. It's three overlapping layers that limit what your agent can read, write, and execute.
Layer 1: Filesystem Scope
Your AGENTS.md defines which directories the agent can access. A well-configured file looks like this:
## File Access
You have write access to:
- `/home/openclaw/.openclaw/workspace-content/`
You have read access to:
- `/home/openclaw/marketing/`
You do NOT have access to:
- Home directory (`~`) outside your workspace
- `/etc`, `/var`, `/usr`, `/tmp` (unless explicitly granted per task)
This isn't enforced at the kernel level the way jai does it, but it's a binding instruction that scopes every file operation the agent performs. A well-structured AGENTS.md means your agent doesn't try to reach outside its lane because it knows where its lane is.
For harder enforcement, the OpenClaw filesystem sandbox guide walks through using a dedicated OS user and bubblewrap to add kernel-level restrictions on top of the config layer.
Layer 2: Tool Allowlists
The exec tool is the highest-risk capability in any OpenClaw agent. Every command your agent runs goes through exec — and by default in development configs, that's often set to security: "full".
Never ship that to production.
A production TOOLS.md should have an explicit exec policy:
## Exec Policy
Allowed commands (must match exactly):
- `python3 /home/openclaw/marketing/scripts/compose-image.py`
- `curl -s https://openagents.mom/api/*`
- `gog drive upload *`
- `gog docs create *`
All other exec calls require human approval before running.
This is the security: "allowlist" setting in OpenClaw's config. The agent can only run commands matching your patterns. Anything else — including the kind of rm -rf operations jai documented — gets blocked at the gateway level before it touches the filesystem.
Layer 3: Human-in-the-Loop Gates
The jai paper noted that irreversible actions were the highest-risk category. File deletion, email sends, API calls to external services — these can't be undone.
OpenClaw's approval system lets you gate these at the config level. Any exec call with security: "allowlist" that doesn't match a pattern surfaces for human approval. You can also set approval-required policies in AGENTS.md:
## Approval Required
Always request human approval before:
- Deleting any file
- Running commands not in the allowlist
- Sending messages to external channels
- Making any API call that modifies data
This is exactly what the human-in-the-loop AI agent pattern provides. The agent's autonomy is bounded by explicit checkpoints.
Why "Security by Default" Matters More Than Retrofitting
The jai researchers' main critique wasn't that sandboxing is impossible. It's that nobody sets it up because it feels optional until something goes wrong.
This is the "secure default" problem. When you bootstrap an OpenClaw agent, the starting configuration gives the agent broad access because that's what makes demos easy to run. You figure out sandboxing later. Except "later" is often after an agent has already done something unexpected.
The OpenClaw security checklist covers every layer in the stack — from exec permissions to channel access to credential handling. But there's a simpler version: your sandbox config should be set up before you run your first production task, not after the first incident.
What jai Recommends vs. What OpenClaw Can Do Today
Here's the honest comparison:
| Protection | jai Approach | OpenClaw Approach |
|---|---|---|
| Filesystem isolation | Kernel-level (bubblewrap/seccomp) | AGENTS.md scope + dedicated OS user (optional) |
| Exec restriction | Policy engine per-task | Exec allowlist in config + approval gate |
| Irreversible action blocking | Automatic pre-task analysis | HITL approval + AGENTS.md rules |
| Real-time monitoring | jai runtime hooks | Manual review + HEARTBEAT.md checks |
| Setup effort | Moderate (policy syntax) | Low (markdown config) |
The jai approach is more technically rigorous. It catches things that config-layer sandboxing can't, because the kernel doesn't care what your AGENTS.md says.
But jai requires you to install and maintain a separate framework alongside OpenClaw. For most builders — especially those running OpenClaw on a VPS or personal server — the config-layer approach with a dedicated OS user covers 90% of the risk with 10% of the setup effort.
The remaining 10% is for production deployments handling sensitive data, or agents with broad tool access in regulated environments. That's where you add bubblewrap or containerization on top.
Common Mistakes
- Leaving exec on
security: "full"after development. This is the most common path to an agent running commands it shouldn't. Switch toallowlistbefore any production use. - Not scoping the filesystem in AGENTS.md. An agent without file-access boundaries will follow instructions that lead it anywhere. Be explicit.
- Assuming the model won't do something you didn't ask for. The jai paper documented spontaneous cleanup behaviors that nobody prompted. Constraints prevent model-initiated actions, not just user-prompted ones.
- Skipping the dedicated OS user. Running your OpenClaw agent as your main user account means any escaped exec command has your full home directory accessible. A separate
openclawuser with a scoped home directory limits the blast radius significantly.
Security Guardrails
- Never store API keys, passwords, or tokens in SOUL.md, AGENTS.md, or MEMORY.md. Use environment variables or a secrets manager. Files in the workspace can be read by the agent and potentially logged.
- Set
security: "allowlist"on exec before deploying any agent to production. Document every allowed command pattern explicitly. - Gate irreversible actions with human approval. If the agent can delete, send, or modify external state, that action should surface for review before execution.
- Review your agent's file access scope monthly. Workspace configs drift over time as you add capabilities — check that the agent still has only what it needs.
Getting Your OpenClaw Agent Sandbox-Hardened Today
If you're starting from a default OpenClaw config, here are the three changes to make right now:
-
Restrict exec to an allowlist. In your OpenClaw config JSON, set
security: "allowlist"for the exec tool and document the exact patterns your agent needs. -
Scope your AGENTS.md file access section. Write explicit read/write directories. Everything else is off-limits by default.
-
Create a dedicated OS user for OpenClaw. On any Linux host,
adduser openclaw --system --no-create-home --shell /bin/falsefollowed by running OpenClaw as that user adds a meaningful filesystem boundary at zero cost.
The Stanford jai paper documented what happens when you skip these steps. It's not a hypothetical. Agents with broad access will eventually take broad actions — whether you intended them to or not.
The sandbox escape and Snowflake Cortex incident is a good example of what "broad access" looks like in a real production breach. The OpenClaw sandbox pattern addressed there applies directly to local deployments too.
Generate a Sandbox-Hardened Agent Config
Answer a few questions about your agent's role and permissions, and we'll generate workspace files with exec allowlists, HITL gates, and scoped filesystem access already configured.