Your First DevOps Agent: Use OpenClaw to Watch Deploys and Ping You When Things Break
Your deploy finishes at 2:47 AM. Everything looks green in CI. Six minutes later, your API starts returning 503s on one specific endpoint — the one your biggest customer hits every morning at 9 AM. You find out at 9:03 AM, via Slack, from them.
That's the gap a DevOps AI agent is actually useful for: the window between "deploy completed" and "something is clearly wrong." Not replacing your observability stack — just sitting between your existing tools and your attention, and pinging you when the numbers move in the wrong direction.
This tutorial walks you through building an OpenClaw deploy monitor agent that tails logs, polls health endpoints, and fires a Telegram or Slack notification when a deploy misbehaves. You'll also lock down shell access so the agent can observe without being able to touch production state.
What You're Actually Building
The agent has three jobs:
- Watch a health endpoint after each deploy and report HTTP status, latency, and payload changes.
- Tail a log file (or pull from
journalctl) and surface lines that match error patterns. - Send you an alert via Telegram or Slack when either of the above crosses a threshold you set.
This is not a full AIOps setup. It doesn't auto-remediate, it doesn't manage your Kubernetes cluster, and it won't write a postmortem. It's a lightweight watcher you can trust because you can read the entire config in under five minutes.
Prerequisites
You need:
- OpenClaw installed and running on a host that can reach your deployment target (a VPS, bare-metal box, or your CI runner's network)
- A Telegram bot token or a Slack incoming webhook URL
- SSH or local access to the server you're monitoring — the agent runs there, not in your cloud account
- Basic familiarity with YAML and shell commands
If you're starting from scratch, read From LangChain to OpenClaw: Ship Your First File-Based Agent in One Evening first. It covers installation and your first working config.
Project Layout
Create a dedicated workspace directory:
/opt/agents/devops-watcher/
├── AGENTS.md # agent instructions
├── SOUL.md # behavioral constraints
├── tools/
│ ├── check_health.sh
│ ├── tail_errors.sh
│ └── send_alert.sh
├── state/
│ └── last_deploy.txt
└── .openclaw/
└── config.yaml
Keep the state/ directory. The agent writes the last-known deploy SHA and health status there so it doesn't spam you with repeated alerts about the same event.
Writing the AGENTS.md
The AGENTS.md is where you define what the agent should actually do. Keep it specific — vague instructions produce vague behavior.
# DevOps Watcher Agent
## Role
You monitor application health after deploys and alert the operator when
something looks wrong. You do NOT modify files outside ./state/. You do NOT
restart services, run migrations, or execute commands not listed in ./tools/.
## Workflow
1. On trigger: read ./state/last_deploy.txt to get the previous deploy SHA.
2. Run ./tools/check_health.sh and capture HTTP status + response time.
3. Run ./tools/tail_errors.sh and capture the last 50 lines matching ERROR|FATAL|panic.
4. Compare results to the thresholds below.
5. If any threshold is crossed, call ./tools/send_alert.sh with a summary.
6. Write the current deploy SHA and timestamp to ./state/last_deploy.txt.
## Thresholds
- HTTP status != 200: alert immediately
- Response time > 2000ms: alert
- Any new ERROR or FATAL line not seen in previous run: alert
- Three consecutive healthy checks: send a "deploy looks stable" message
## Alert Format
Subject: [DEPLOY WATCH] <service> — <status>
Body: SHA, timestamp, HTTP status, response time, error lines (if any)
Note the explicit boundary: the agent can read tools and write to state/. Nothing else. That constraint lives in plain text, not buried in a framework config you can't easily audit.
The Shell Tools
Keep each tool to a single responsibility.
tools/check_health.sh
#!/bin/bash
# Usage: ./check_health.sh <url>
URL="${1:-http://localhost:8080/health}"
START=$(date +%s%3N)
RESPONSE=$(curl -s -o /tmp/health_body.txt -w "%{http_code}" --max-time 10 "$URL")
END=$(date +%s%3N)
LATENCY=$((END - START))
echo "status=$RESPONSE latency=${LATENCY}ms"
cat /tmp/health_body.txt
tools/tail_errors.sh
#!/bin/bash
# Adjust LOG_PATH to your actual log file
LOG_PATH="${1:-/var/log/myapp/app.log}"
grep -E 'ERROR|FATAL|panic' "$LOG_PATH" | tail -50
tools/send_alert.sh
#!/bin/bash
# Set TELEGRAM_TOKEN and CHAT_ID as env vars, or swap for a Slack webhook
MESSAGE="$1"
if [ -n "$TELEGRAM_TOKEN" ] && [ -n "$CHAT_ID" ]; then
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_TOKEN}/sendMessage" \
-d chat_id="$CHAT_ID" \
-d text="$MESSAGE"
elif [ -n "$SLACK_WEBHOOK" ]; then
curl -s -X POST "$SLACK_WEBHOOK" \
-H 'Content-type: application/json' \
--data "{\"text\": \"$MESSAGE\"}"
fi
Make all three executable: chmod +x tools/*.sh.
Sandboxing Shell Access
This is the part most tutorials skip, and it's where deploy monitor agents go wrong. An agent with unrestricted shell access on a production host is a security incident waiting to happen — whether from a prompt injection in a log line or just a hallucinated command.
In your .openclaw/config.yaml, set explicit shell restrictions:
agent:
name: devops-watcher
workspace: /opt/agents/devops-watcher
shell:
enabled: true
allowed_commands:
- /opt/agents/devops-watcher/tools/check_health.sh
- /opt/agents/devops-watcher/tools/tail_errors.sh
- /opt/agents/devops-watcher/tools/send_alert.sh
allowed_write_paths:
- /opt/agents/devops-watcher/state/
deny_network_access: false # needs outbound for Telegram/Slack
deny_write_outside_workspace: true
env:
TELEGRAM_TOKEN: "${TELEGRAM_TOKEN}"
CHAT_ID: "${CHAT_ID}"
The allowed_commands list is an allowlist, not a suggestion. If the agent tries to run systemctl restart myapp, it gets blocked. Read OpenClaw Sandbox Security and the OpenClaw Agent Filesystem Sandbox docs before running this on anything internet-facing.
Security Guardrails
- Never mount secrets as plain env vars in the config file. Use your system's secret manager or pass them at runtime via the shell environment. The
config.yamlshouldn't contain literal tokens. - Scope the agent's Unix user tightly. Run the agent under a dedicated user (
devops-watcher) with read access to logs and no sudo rights. - Log every tool invocation. OpenClaw writes a tool-call log by default — check that it's going somewhere you can review it. An agent calling
send_alert.sh400 times is an alert in itself. - Treat log content as untrusted input. A log line that says
; rm -rf /is not a command. Make sure yourtail_errors.shpipes output through a safe path, not directly into a shell eval.
Triggering the Agent
You have two reasonable options.
Option 1 — CI/CD hook. At the end of your deploy pipeline, call:
openclaw run devops-watcher --trigger post-deploy --var DEPLOY_SHA=$GIT_SHA
This fires the agent exactly when a deploy finishes. It gets the SHA, runs its checks, and goes quiet until the next deploy.
Option 2 — Cron polling. For services that deploy continuously or where you don't control the CI pipeline:
*/5 * * * * /usr/local/bin/openclaw run devops-watcher --trigger scheduled
Every five minutes, the agent checks health and logs. Less precise than a post-deploy trigger, but it catches issues on services you didn't touch — like a database dependency that started returning errors independently.
Common Mistakes
- Alerting on every log ERROR regardless of frequency. Set a dedup window in
state/last_deploy.txt. If the same error pattern appeared in the last run, skip it. Otherwise you'll get 200 Telegram messages in ten minutes and start ignoring all of them. - Polling too aggressively. A 30-second health check interval on a slow endpoint will time out and generate false positives. Start at 60 seconds and tune down once you know your baseline latency.
- No "all clear" message. If you only send alerts when things break, you won't know when they recover. The three-consecutive-healthy-checks rule in the AGENTS.md above handles this.
Reading the Agent's Output
Once the agent runs, check state/last_deploy.txt:
sha=a3f1c92
timestamp=2026-06-01T03:12:44Z
health_status=200
latency_ms=312
error_lines=0
alert_sent=false
If alert_sent=true, the corresponding notification went to Telegram or Slack with a summary. Cross-reference the agent's tool-call log at .openclaw/logs/tool_calls.jsonl to verify what actually ran.
For a deeper look at file-based agent memory and how to keep state clean across runs, see File-Based Agent Memory Benchmark Wins (agents.md).
Extending the Agent
Once this baseline works, a few extensions are worth adding:
- Container health: swap
check_health.shfor a Docker-aware version that callsdocker inspect --format='{{.State.Health.Status}}' - Database connectivity: add a
tools/check_db.shthat runs a lightweight query and returns row count or error - Multi-service support: run the agent with a different
--var SERVICE=paymentsflag and keep separate state files per service - Escalation: if three consecutive alert cycles pass without a healthy check, have
send_alert.shcall a PagerDuty webhook instead of Telegram
Avoid adding remediation commands (restart, rollback, scale) until you've run this in read-only mode for at least two weeks and trust the thresholds. An agent that restarts services based on a single 503 will cause more incidents than it prevents.
If you want to harden the overall security posture before expanding capabilities, the OpenClaw Security Checklist covers 15 concrete steps worth running through first.
This Is the Right Level of Automation
A deploy monitor agent doesn't need to be intelligent. It needs to be reliable, auditable, and quiet when things are fine. The config above does all three — you can read every decision path in the AGENTS.md, every tool call is logged, and it stays silent unless a threshold is crossed.
The goal isn't to replace your observability stack. It's to give you a second set of eyes specifically on the post-deploy window, without adding another SaaS subscription or wiring up another webhook pipeline by hand. Once this is running, you'll wonder why you were checking deployment logs manually at all.
Deploy a Watcher Agent With Security Guardrails Already Configured
Get an OpenClaw deploy monitor agent workspace — sandboxed shell access, threshold config, and Telegram/Slack alert tooling ready to drop onto your server.