You set up an OpenClaw agent last week. It seemed lean. Then the billing notice arrived.
If your agent runs a persistent session, it's carrying its full conversation history in every single API call. A 50-message thread isn't just 50 messages — it's the compounding weight of every system prompt, every file read, every tool response that came before. Token bills climb fast when you're not paying attention to session design.
The good news: most of the waste is fixable with straightforward config changes.
Why Persistent Sessions Get Expensive
OpenClaw agents work inside sessions. Each session maintains a rolling context window that gets sent to the model on every turn. In a continuous, always-hot session, that context grows throughout the day.
The math is brutal. If your agent's baseline context (SOUL.md, AGENTS.md, tool descriptions) is 3,000 tokens, and you're running 50 turns per day, you're not paying for 50 × average-response-size tokens. You're paying for 50 × (3,000 + cumulative-conversation-history) tokens. By turn 50 in an unbounded session, that baseline 3,000 has become 20,000 or more.
The solution isn't to use the agent less. It's to scope sessions correctly and schedule heavy work through HEARTBEAT instead of leaving sessions hot.
Strategy 1: Schedule with HEARTBEAT, Don't Babysit
The most common token sink is agents waiting around between tasks in an active session. Every heartbeat check, status message, or "anything new?" exchange in a hot session burns tokens.
HEARTBEAT.md is the right tool for recurring tasks. Instead of keeping a session open and polling manually, define the check in your heartbeat config:
## Heartbeat Tasks
### Every 30 minutes: Check server health
- Ping the health endpoint at `https://myapp.com/health`
- If status is not 200, send a Telegram alert
- Log result to `memory/health-log.md`
### Every 6 hours: Summarize inbox
- Read the last 20 Gmail messages via `gog`
- Write a 5-point summary to `memory/inbox-digest.md`
- Only alert if something needs a response within 24h
The session spins up, runs the task, writes output, and terminates. No idle time. No accumulated context. You pay for the task, not the wait.
This pattern alone can cut token usage by 60–80% for agents that currently run continuous monitoring loops.
Strategy 2: Scope Sessions to Single Jobs
An agent session that does everything accumulates context for everything. If your DevOps agent monitors logs, answers questions, triages issues, and writes summaries — all in one persistent session — it's dragging the entire history of every task into every new task.
The fix is to define purpose boundaries in AGENTS.md. A narrow session scope means a shorter context:
## Session Scope
This agent handles ONE job per session: triage the build failure log and suggest a fix.
Do not carry context from previous sessions.
Do not accept unrelated requests — redirect to the appropriate agent.
At session end, write a one-sentence summary to `memory/triage-log.md` and terminate.
If you need multiple jobs covered, use multiple agents with separate workspace folders. Each runs its own tight session with its own bounded context. You get better isolation, cleaner memory, and lower token bills.
Strategy 3: Prune Context Actively
Even well-scoped sessions accumulate cruft. Tool call results, intermediate reasoning steps, and error traces all stay in context until you clear them. In a long-running debugging session, that can mean 5,000 tokens of stack traces sitting in context for every subsequent turn.
Two techniques help:
Explicit summarization checkpoints. Add an instruction to your AGENTS.md that tells the agent to summarize and compact context every N turns:
## Context Management
After every 10 turns:
1. Summarize the work done so far into 3-5 bullet points
2. Write the summary to `memory/YYYY-MM-DD-session-summary.md`
3. Treat the summary as the new context anchor — don't carry raw message history further
Memory offload before long tasks. Before any task likely to generate verbose output (log analysis, code review, multi-step research), instruct the agent to dump current state to memory and work from a fresh context:
Before starting any task over 500 estimated tokens:
- Write current state to `memory/pre-task-YYYY-MM-DD.md`
- Reference that file instead of recalling history inline
Strategy 4: Model-Match Your Tasks
Not every task needs Claude Opus or Sonnet 4.6. If your HEARTBEAT runs a simple health check every 30 minutes — "is the endpoint returning 200, yes or no?" — running it against a large model is overkill.
OpenClaw lets you specify the model at the session or cron job level. Use a smaller, cheaper model for:
- Health checks and status pings
- Simple file reads and writes
- Templated report generation
- Anything with a yes/no or fill-in-the-blank output shape
Reserve larger models for tasks that actually need reasoning: ambiguous debugging, natural language analysis, or anything requiring judgment.
The OpenClaw token cost guide has benchmarks for common task types — use it to calibrate which model tier makes sense before you commit to a pattern.
Strategy 5: Audit Before You Optimize
Before tuning anything, measure. Add a simple logging instruction to AGENTS.md:
## Session Logging
At the end of every session, write to `memory/token-log.md`:
- Session start time
- Task name
- Estimated token usage (check last model response metadata if available)
- Session end time
After a week, you'll have a clear picture of which tasks are burning the most tokens. Optimizing blind wastes time; optimizing with data takes 30 minutes.
You can also check your provider's usage dashboard directly. OpenRouter and Anthropic both expose per-request token counts. Sort by cost descending — the top 3 offenders are almost always the same culprits: long-running monitoring loops, broad search tasks, or agents with unscoped context.
What Not to Cut
A few places where cutting context is a mistake:
SOUL.md and AGENTS.md always stay in context. These files define who your agent is and how it behaves. An agent that can't access its own operating manual will make inconsistent decisions and require more turns to complete tasks — costing more, not less.
Security instructions need full context. If you've written explicit tool restrictions or approval gates in AGENTS.md, don't let those get pruned away. An agent that forgets its security guardrails mid-session is a security risk. See the OpenClaw sandbox security guide for what belongs in a permanent context anchor.
Don't compress memory that hasn't been read. Summarizing session state is fine. Overwriting accumulated memory files before the agent has referenced them is a data loss bug dressed as cost optimization.
Common Mistakes
- Leaving the default session hot all day. Every hour of idle session time is accumulated context you'll pay to send on the next turn.
- Using a flagship model for every task. Health checks, templated writes, and structured data extraction work fine on smaller models at 10–20% of the cost.
- Compressing SOUL.md to save tokens. Your agent's identity and behavior rules must stay intact. Trim verbose comments, not core instructions.
- Summarizing too aggressively. Collapsing too much context too early means the agent loses the information it needs and takes more turns to recover it.
- Not logging before optimizing. Optimizing from intuition rather than usage data leads to cutting the wrong things.
Security Guardrails
- Keep security instructions (tool restrictions, approval gates) in permanent context — never let these get pruned mid-session.
- When scoping to single-job sessions, verify the agent can't accept lateral requests from other contexts that bypass its permission model.
- Don't log sensitive data (API responses, PII, credentials) to memory files you're using as context anchors.
- Review HEARTBEAT tasks after every tuning pass — a misconfigured interval can silently multiply API calls.
Putting It Together
A well-tuned agent costs 60–80% less than a naively configured one. The changes aren't complex:
- Move recurring tasks to HEARTBEAT.md and let sessions terminate
- Scope each session to a single job boundary in AGENTS.md
- Add a context summarization checkpoint every 10 turns
- Use smaller models for structured, low-judgment tasks
- Log token usage for one week before making further changes
If you're starting from scratch, OpenAgents.mom's wizard generates HEARTBEAT.md and AGENTS.md with session scoping and context management instructions already in place. You get cost-aware defaults before you've written a single line yourself.
Build a Cost-Aware Agent From the Start
Your workspace bundle comes with HEARTBEAT.md and AGENTS.md pre-configured for bounded sessions and scheduled tasks — not runaway context. Skip the expensive lesson.