← Back to Blog

Navigating AI Security Risks: A Practical Guide for Security Teams

OpenAgents.mom · 2026-07-01 · 9 min read

Navigating AI Security Risks: A Practical Guide for Security Teams

In early 2025, a European financial services firm lost access to a production AI pipeline for 72 hours after an attacker injected malicious instructions into a document the agent was summarizing. The agent dutifully followed those instructions — exfiltrating internal API keys to an external webhook before anyone noticed the traffic anomaly. No custom exploit. No zero-day. Just a poorly constrained agent doing exactly what it was told.

This is the shape of AI security risk in 2026. The attack surface isn't just your infrastructure — it's the reasoning layer sitting on top of it. If your security posture hasn't caught up to that reality, this guide is a starting point.

The Attack Surface Has Changed

Traditional security assumes a relatively static attack surface: servers, endpoints, APIs, credentials. AI agents add a dynamic reasoning layer that can interpret instructions, take autonomous actions, and chain tool calls across your infrastructure.

That reasoning layer is not trustless. It accepts natural language as input, which means any text your agent processes — emails, documents, web pages, database rows — is potentially adversarial input. You can't firewall your way out of that without also constraining what the agent is allowed to do.

Prompt Injection: The Threat Your WAF Can't Stop

Prompt injection is the most widely observed AI-specific attack vector right now. An attacker embeds instructions inside data the agent will read — a support ticket, a PDF, a webpage — and the agent treats those instructions as legitimate commands.

Indirect prompt injection is particularly dangerous because the malicious content never touches your application's input layer directly. A document your agent retrieves from an external URL can override its system prompt constraints if you haven't built explicit defenses at the agent level. The memory safety work happening in multi-agent systems addresses part of this, but prompt-level defenses need to be separate and layered.

The mitigation isn't a single fix. It's a combination of output validation, tool-call allowlisting, and strict privilege separation between what the agent can read and what it can act on.

Credential Exposure in Agent Configs

AI agents need to call APIs. That means credentials — and credentials end up somewhere in the agent's configuration or runtime context. If that context leaks (through logs, tool call traces, or a compromised memory store), those credentials travel with it.

The pattern to avoid: embedding raw API keys in prompt context, AGENTS.md files, or any artifact the agent can read and repeat. Use environment variables injected at runtime, not at prompt-build time. For agents running in team environments, use short-lived tokens with the minimum scope required for the task.

This is covered in more depth in our post on keeping secrets out of agent credentials — the principles apply regardless of which runtime you're using.

Common Mistakes

Hardcoding credentials in SOUL.md or AGENTS.md. These files are often committed to version control or readable by the agent itself. Use environment references instead.
Granting agents write access to production systems during development. Start read-only, add write permissions only after the agent's behavior is validated in staging.
Assuming model-level safety = application-level safety. A model that refuses harmful requests in chat can still be manipulated in an agentic context where instructions arrive through tool outputs.

Privilege Escalation Through Tool Chaining

Agents that can call multiple tools in sequence can accidentally — or intentionally — escalate their own privileges. A read-only file tool plus a write-to-S3 tool plus a send-email tool is a data exfiltration pipeline waiting to be triggered.

Principle of least privilege applies here with more force than in traditional systems, because agents make decisions autonomously. You're not reviewing each tool call before it executes. Design your tool manifests so that no single agent session can complete a sensitive multi-step action (read credentials → write to external storage → notify external endpoint) without hitting an explicit human-approval gate.

Some frameworks handle this natively. LangGraph lets you define interrupt points between graph nodes. AutoGen supports human-in-the-loop confirmation steps. If your current setup doesn't have these, that's a gap worth closing before you expand agent scope.

Data Leakage Through Model Context

When you pass sensitive documents to an agent for analysis, those documents enter the model's context window. Depending on your setup, that context may be logged, cached, or stored in a vector database for retrieval.

Ask yourself: where does the context go after the session ends? If your agent runtime logs full context windows to a shared observability platform, any PII, internal financials, or confidential IP in those logs is now exposed to whoever has access to that platform. Map your data flows before you put sensitive content into agent workflows — not after.

For regulated industries, this matters even more. The governance gaps in AI post covers how organizations are starting to formalize these data-flow requirements into policy.

Supply Chain Risk in the MCP Ecosystem

The MCP (Model Context Protocol) server ecosystem has grown fast. As of mid-2026, there are hundreds of community-published MCP servers covering everything from Notion integrations to database connectors to web scrapers.

Each one of those servers is a dependency you're trusting with your agent's tool access. A compromised or malicious MCP server can return injected instructions as tool output, exfiltrate data passed to it, or silently modify results before returning them to the agent.

Vet MCP servers the same way you'd vet any open-source dependency: check the source, review the code, pin to a specific commit hash, and run in a sandboxed environment. Don't pull from unverified registries in production. The securing AI agents best practices post has a checklist approach that works across runtimes.

Security Guardrails

Pin MCP server versions in your config. Floating dependencies can silently update and change behavior.
Run tool servers in isolated processes with no ambient credentials. They should receive only what they need for each specific call.
Log all tool call inputs and outputs. You need this for incident reconstruction — and you'll need it sooner than you expect.
Require human approval before any agent action that crosses a trust boundary (writes to external systems, sends communications, modifies production data).

Agentic Persistence and Scheduled Tasks

Agents that run on schedules or respond to triggers can persist long after the person who deployed them has changed roles — or left the organization. An agent running with a former employee's credentials and broad tool access is a security liability nobody is thinking about.

Build agent lifecycle management into your ops process. Document which agents are running, what credentials they use, what they're authorized to do, and who owns them. Treat decommissioning an agent with the same rigor as decommissioning a service account. Automated agents should be tied to service identities, not individual user accounts.

Evaluating AI Security Posture for Third-Party Integrations

If you're evaluating an AI vendor or platform — whether it's an orchestration layer, a model provider, or a SaaS agent product — ask specific questions rather than accepting generic SOC 2 compliance as the answer.

Some useful questions: Does model inference happen on shared or dedicated infrastructure? Can training on your inputs be opted out of? Where are conversation logs stored and for how long? What happens to data if you terminate the contract? Does the system support audit logging at the tool-call level?

These questions matter more for AI systems than for traditional SaaS because the data exposure surface is higher — models are processing unstructured content at scale, which increases the risk of inadvertent data retention or cross-tenant leakage.

Building Toward Continuous AI Security Monitoring

Static security reviews aren't enough for systems that reason dynamically. An agent that passes a security review today can behave differently tomorrow if the underlying model is updated, a tool server changes its behavior, or new data flows into its context.

Continuous monitoring for AI systems should include: behavioral drift detection (is the agent doing things it didn't used to do?), anomalous tool-call patterns (sudden spikes in outbound API calls, writes to unexpected destinations), and regular red-team exercises that include prompt injection attempts against your production agent configs.

Some teams are starting to use agents specifically for this purpose — a watchdog agent that monitors the logs of production agents and flags anomalies. The pattern is covered in more depth in building a devops watchdog agent, and the same architecture applies to security monitoring use cases.

Building AI Security Into the Development Process

Security reviews that happen after deployment don't work well for AI systems. By the time an agent is in production, its tool access, prompt structure, and data flows are already baked in — retrofitting constraints is harder and more error-prone than building them in from the start.

The shift required is treating agent configuration as code and including it in your existing security review pipeline. SOUL.md and AGENTS.md files, tool manifests, MCP server configs — these should go through the same review gates as infrastructure-as-code. Reviewers should be looking for overly broad permissions, credential exposure, missing approval gates, and data flows that cross trust boundaries without logging.

AI security isn't a separate discipline from the security work you already do. It's the same principles applied to a new attack surface — and the sooner you integrate it into your existing processes, the less expensive it is to get right.

The threat landscape here is moving quickly, and the teams that will handle it best aren't the ones waiting for a comprehensive framework to appear. They're the ones building practical defenses now, iterating as new attack patterns emerge, and treating their agent configs with the same rigor they apply to the rest of their infrastructure.

Enforce Your AI Security Policy at the Agent Config Level

Your security requirements belong in your agent's configuration — not just in a policy document. Use our wizard to generate a structured agent bundle with explicit permission boundaries, credential handling rules, and approval gates built in from day one.

Generate Your Security-Aware Agent Config

Send Feedback

Navigating AI Security Risks: A Practical Guide for Security Teams

The Attack Surface Has Changed

Prompt Injection: The Threat Your WAF Can't Stop

Credential Exposure in Agent Configs

Privilege Escalation Through Tool Chaining

Data Leakage Through Model Context

Supply Chain Risk in the MCP Ecosystem

Agentic Persistence and Scheduled Tasks

Evaluating AI Security Posture for Third-Party Integrations

Building Toward Continuous AI Security Monitoring

Building AI Security Into the Development Process

Enforce Your AI Security Policy at the Agent Config Level

Weekly newsletter