AI Agent Security Controls — Enterprise Reference 2026

AI Agent Security Controls

Enterprise Reference · 2026 Edition

CRITICAL 12 Security Controls NIST AI RMF · OWASP LLM Top 10 Zero Trust Agents

AI Agent
Security
Controls

AI agents are no longer experimental. They are production infrastructure with production-grade attack surfaces. A single compromised agent can exfiltrate terabytes of data, move laterally across systems, and trigger cascading business failures — before traditional security tools detect a breach. These 12 controls are the operational framework that separates trustworthy agent deployments from catastrophic ones.

88%

of organisations confirmed or suspected AI agent security incidents in 2025 — Gravitee State of AI Agent Security 2026

14.4%

of AI agents go live with full security approval — Gravitee 2026. 81% are past planning but only 14% have governance.

48%

of cybersecurity professionals identify agentic AI as the single most dangerous attack vector — Dark Reading poll 2026

100:1

ratio of non-human identities to humans in some enterprises — AI agents outnumber the people governing them

01 Zero-Trust Identity

02 JIT Tool Access

03 Injection Defense

04 Output Protection

05 Risk Action Control

06 Human-in-the-Loop

07 Sandbox Execution

08 Secure Memory

09 Cross-Agent Isolation

10 Behavioral Monitoring

11 Continuous Red Team

12 Supply Chain

The Security Imperative

Securing AI Agents Is the Defining Cybersecurity Challenge of 2026

Bessemer Venture Partners named it plainly: securing AI agents is the defining cybersecurity challenge of 2026. Gartner projects that 40% of enterprise applications will embed task-specific AI agents this year — up from less than 5% in 2025. But as AI extends into autonomous workflows, cyberthreats are proliferating faster than the defences designed to contain them. In a controlled red-team exercise, McKinsey’s internal AI platform was compromised by an autonomous agent that gained broad system access in under two hours.

The security challenge is architectural, not incremental. AI agents are not another application surface — they are autonomous, high-privilege actors that can reason, act, and chain workflows across systems. Most agents today inherit broad permissions from the systems they connect to, with no zero-trust boundaries governing what they can actually reach. The first documented AI-orchestrated cyber-espionage campaign, disclosed in late 2025, showed a jailbroken agent handling 80–90% of a complex attack chain autonomously — reconnaissance, exploitation, credential theft, and data exfiltration — with humans only guiding critical decisions.

NIST’s February 2026 concept paper on AI agent identity and authorisation confirmed the regulatory direction: every AI agent needs a managed identity with scoped authentication, per-task authorisation, tamper-proof audit logging, and prompt injection controls. These are not emerging best practices — they are the minimum viable security baseline for any enterprise running agents in production. The 12 controls documented here implement this baseline end-to-end.

// Top Threat Vectors

Supply Chain Risks (LLM plugins, MCP integrations) — fake packages that silently copy outbound data to attacker servers

Prompt Injection — indirect attacks through external content succeeding with fewer attempts and broader impact than direct attacks

Excessive Privilege — agents inheriting broad permissions from connected systems; no zero-trust boundaries on what they reach

Cross-Agent Lateral Movement — impersonation and session smuggling between agents that implicitly trust each other

Data Exfiltration — agents processing PII, credentials, and confidential data without output controls or destination allow-lists

Identity Spoofing — shared API keys rather than workload identities; 78% of agents still use shared credentials

12 Security Controls — Complete Operational Reference

Identity Layer

AI Agent Identity & Zero-Trust Authentication

Every AI agent must have a managed, cryptographically unique identity — not a shared API key, not inherited human credentials. Non-human identities now outnumber human identities by up to 100:1 in enterprise environments, yet 78% of teams rely on shared credentials for agent authentication. CyberArk’s 2026 analysis is precise: “Every AI agent is an identity. It needs credentials to access databases, cloud services, and code repositories. The more tasks we give them, the more entitlements they accumulate, making them a prime target.” SPIFFE workload identity — which issues short-lived, cryptographically attested certificates to each agent workload — replaces the static API key model. Each agent authenticates as if it were a new actor on every request. Least-privilege role design scopes each agent to exactly the permissions its specific task requires — nothing more. Full audit logging creates the immutable, tamper-evident record that NIST’s AI RMF requires for accountability.

◆

Unique cryptographic identity per agent workload — SPIFFE SVIDs, not shared API keys or service account credentials

◆

Least-privilege role design — scope agent permissions to the minimum required for each specific task, nothing inherited from human accounts

◆

Full audit logging — every agent action attributed to its identity with reasoning trace, approved by whom, and on whose behalf

JIT

Access Control

Just-In-Time Tool Access

Standing tool access is one of the most exploited attack vectors in agentic AI environments. Agents accumulate access over time, and the risk surface grows with every new integration. Bessemer Venture Partners’ 2026 CISO analysis recommends a “gradual, well-defined plan of the available inputs and outputs of each agent… very narrowly scoped, then incrementally expanded.” JIT access grants tool permissions only at the moment they are required for a specific task, then revokes them automatically when that task completes. Per-task authorisation enforces that each individual agent action is approved for exactly the context in which it is taken. Time-bound credentials set absolute expiry regardless of whether the action completed — preventing credential theft from enabling long-term access. Auto-expiring tokens close the standing access window that attackers exploit between agent invocations. This is the primary blast radius reduction control: if an agent is compromised, the attacker’s window of access is measured in minutes, not months.

◆

Per-task authorisation — each agent action requires explicit approval for the specific context; no blanket capability grants

◆

Time-bound credentials — tool access tokens expire at task completion or at a hard time limit, whichever comes first

◆

Auto-expiring tokens — no standing access between invocations; credentials must be re-requested and re-approved for each session

INJ

Input Security

Prompt Injection Defense

Prompt injection is the attack class NIST and OWASP have placed at the top of every AI security framework. Every external content source is a potential injection vector — emails, documents, web pages, database records, and API responses can all carry adversarial instructions that hijack the agent’s reasoning. Lakera AI’s Q4 2025 analysis revealed that indirect prompt injection attacks — arriving through trusted-looking external content rather than direct user input — succeed with fewer attempts and broader impact than direct attacks. In one documented case, a malicious GitHub issue contained hidden instructions that hijacked an agent and triggered data exfiltration from private repositories. Runtime threat detection intercepts injection attempts in the agent’s input stream before they reach the model. Context boundary validation enforces that instructions from external sources cannot override the system prompt’s operational constraints. Policy-based filtering applies organisation-specific rules that block known injection patterns and flag anomalous instruction sequences for human review.

◆

Runtime threat detection — intercept adversarial instruction patterns in all input streams before they reach the model’s reasoning layer

◆

Context boundary validation — external content cannot override system-level operational constraints or escalate agent permissions

◆

Policy-based filtering — organisation-specific block lists and anomaly detection on instruction sequences from untrusted sources

DLP

Data Security

Output & Data Protection

AI agents process Personally Identifiable Information (PII), authentication credentials, financial records, and confidential business data continuously — and they can inadvertently surface that data in responses, log files, downstream API calls, or external system writes. Without output controls, agents become involuntary data exfiltration channels. The State of Agentic AI Security Report found that security leaders want “default PII redaction” as a non-negotiable baseline — not something configured after deployment. PII and secret scanning intercepts sensitive patterns (social security numbers, credit card numbers, API keys, passwords, internal identifiers) in the agent’s output stream before delivery. Response redaction applies to logs, API responses, and any downstream data writes — ensuring that sensitive data appearing in the agent’s reasoning context does not propagate to systems that have no need for it. Destination allow-lists enforce that the agent can only write to, or share data with, pre-approved endpoints — preventing exfiltration to attacker-controlled destinations.

◆

PII / secret scanning — intercept sensitive data patterns in output streams before they reach logs, APIs, or external systems

◆

Response redaction — apply sensitive data masking to all agent outputs, not just user-facing responses

◆

Destination allow-lists — agents can only write to or transmit data to explicitly pre-approved endpoint destinations

RISK

Decision Control

Risk-Based Action Control

Not all agent actions carry equal risk. An agent retrieving a product description is categorically different from one executing a payment or modifying a database record — yet without a risk-based control framework, both actions receive the same authorisation treatment. Risk-based action control assigns a risk score to every proposed agent action before execution, based on financial impact, data sensitivity classification, and the reversibility of the action. Financial impact scoring tags actions by their potential monetary consequence — a $50 API lookup versus a $50,000 wire transfer require different approval thresholds. Data sensitivity tagging classifies the data the action will touch against the organisation’s data classification schema, triggering elevated scrutiny for regulated or confidential data access. Adaptive approval thresholds dynamically adjust authorisation requirements based on the combined risk score — low-risk actions proceed autonomously while high-risk actions are automatically escalated to human review. This creates “smart automation” that scales autonomy to demonstrated trustworthiness without blanket restrictions.

◆

Financial impact scoring — assign monetary risk values to agent actions; auto-escalate above configurable financial thresholds

◆

Data sensitivity tagging — classify data touched by each action; regulated and confidential data access triggers additional authorisation

◆

Adaptive approval thresholds — dynamically adjust authorisation based on combined risk score; autonomous below threshold, human above it

HITL

Oversight Layer

Human-in-the-Loop Controls

Autonomy without oversight is not efficiency — it is liability. The State of Agentic AI Security Report found that 35% of executives admit they could not immediately “pull the plug” on a rogue agent if required. Human-in-the-loop controls define the mandatory checkpoints where human judgement must precede agent action — not as a fallback when things go wrong, but as a designed architectural requirement for actions that carry meaningful consequences. Deloitte’s 2026 enterprise AI governance analysis confirms that “autonomous systems heighten needs for organisations to define where humans should remain in control.” Payment approvals require a named human approver before any financial transaction above the defined threshold. Destructive action review applies to irreversible operations — data deletion, configuration changes, communications sent — where mistakes cannot be undone automatically. Policy override validation requires human authorisation whenever an agent’s proposed action would require deviating from established operational constraints. The outcome is controlled autonomy: agents act confidently within defined bounds and escalate cleanly when those bounds are reached.

◆

Payment approvals — named human authorisation required before any financial transaction above the configured risk threshold

◆

Destructive action review — mandatory human sign-off before irreversible operations: data deletion, infrastructure changes, communications

◆

Policy override validation — human authorisation required whenever the agent’s proposed action would deviate from established operating constraints

SBOX

Execution Security

Sandbox Execution

Agent code execution is one of the most dangerous capabilities in the agentic AI toolkit. When an agent can write and run arbitrary code, a compromised agent can escape virtually every other security control — installing malware, exfiltrating data through side channels, or modifying system configurations in ways that persist after the agent session ends. Sandbox execution contains the blast radius of code execution to an isolated, disposable environment that cannot affect the host system. E2B and similar sandbox platforms provide this isolation as a primitive: each code execution gets a fresh, ephemeral container with no access to the broader network or file system by default. Isolated runtime environments prevent agent code from touching production systems. Restricted system access limits what file paths, environment variables, and system calls are accessible inside the sandbox. Network segmentation controls which external endpoints the sandbox runtime can reach, preventing both outbound data exfiltration and inbound command-and-control connections from a compromised execution environment.

◆

Isolated runtime environment — each code execution in a fresh, disposable container with no access to host systems or shared state

◆

Restricted system access — limit file paths, environment variables, and system calls available inside sandbox; no production data access

◆

Network segmentation — allow-list only the external endpoints the sandbox runtime needs; block all outbound communications by default

MEM

Memory Security

Secure Memory Management

Agent memory systems present an underestimated attack surface. Long-term memory stores — the vector databases that persist agent experiences and learned context across sessions — can contain PII, authentication tokens, business secrets, and sensitive user data accumulated over thousands of interactions. Memory poisoning, where adversarial data is injected into the memory store to corrupt future agent behaviour, is a documented attack class. Session-scoped recall limits memory access during each agent session to only the context that is relevant and authorised for that session’s task — preventing cross-session data bleed. Encrypted vector storage ensures that memory contents cannot be read by anyone with raw database access — the encryption key is managed separately from the storage layer. Sensitive data expiry enforces automatic deletion policies on memory records containing regulated data — PII, financial information, health data — at configurable retention windows that align with regulatory requirements (GDPR Article 17, HIPAA minimum necessary standards).

◆

Session-scoped recall — limit memory retrieval to context relevant and authorised for the current session’s task; no cross-session data bleed

◆

Encrypted vector storage — memory contents encrypted at rest with keys managed separately from the storage layer; no plaintext data at risk

◆

Sensitive data expiry — automatic deletion policies on memory records containing PII, financial data, or regulated health information

ISOL

Multi-Agent Security

Cross-Agent Isolation

Multi-agent systems introduce a trust model that attackers actively exploit. In a naive multi-agent deployment, a compromised “manager” agent can issue unauthorised commands to “worker” agents that trust it implicitly — enabling lateral movement across the entire agent fleet without any single external attack. Help Net Security’s 2026 enterprise analysis documented agent-to-agent impersonation, session smuggling, and unauthorised capability escalation as active attack patterns in Q4 2025. Signed communication requires every inter-agent message to carry a cryptographic signature that proves the sending agent’s identity and the message’s integrity — unsigned messages are rejected. Permissioned messaging enforces that each agent can only receive instructions from agents it is explicitly authorised to receive instructions from — the trust relationship must be pre-declared, not assumed. Memory separation prevents agents from reading each other’s memory stores — even agents within the same orchestration system cannot access another agent’s session context or long-term episodic memory without explicit cross-agent permission grants.

◆

Signed communication — all inter-agent messages must carry cryptographic proof of sender identity; unsigned messages rejected at receipt

◆

Permissioned messaging — agents can only receive instructions from explicitly pre-authorised senders; implicit trust between agents is prohibited

◆

Memory separation — agents cannot access each other’s session context or episodic memory without explicit cross-agent permission grants

BEH

Runtime Security

Behavioral Monitoring

Static security controls define what agents are permitted to do. Behavioral monitoring catches what they are actually doing — and flags the divergence before it becomes a breach. Real-time behavioural analytics must track agent actions continuously, not in periodic batch scans, because the window between agent compromise and data exfiltration can be measured in minutes. The State of Agentic AI Security Report found that 79% of enterprises have security blind spots where agents invoke tools, touch data, or trigger actions the security team cannot fully observe. Loop detection identifies when an agent has entered a recursive execution pattern — a classic sign of prompt injection or runaway agentic behaviour that will exhaust resources or cause unintended repeated actions. Privilege escalation alerts trigger when an agent attempts to access resources or invoke capabilities beyond its declared scope — either through legitimate-looking API calls or through chained tool invocations that collectively exceed the permitted blast radius. Tool misuse tracking identifies anomalous patterns in tool invocation: unusual call sequences, unexpected data volumes, calls at atypical hours, or tool combinations that suggest external manipulation.

◆

Loop detection — real-time identification of recursive execution patterns that signal prompt injection, runaway behaviour, or resource exhaustion

◆

Privilege escalation alerts — flag any attempt to access resources or capabilities beyond the agent’s declared permission scope

◆

Tool misuse tracking — anomaly detection on tool invocation patterns; flag unusual sequences, volumes, or timing that suggest external manipulation

RED

Adversarial Testing

Continuous Red Teaming

Static security configuration becomes stale the moment it is deployed. The AI threat landscape is evolving faster than annual penetration testing cycles can track — new prompt injection techniques, jailbreak methods, and tool abuse patterns emerge continuously. Anthropic’s documentation on GTG-1002 (the first AI-orchestrated cyber-espionage campaign, November 2025) demonstrated that attackers are applying AI to red teaming at the same speed defenders are applying it to protection. Continuous red teaming runs automated adversarial testing against production agent deployments on a continuous schedule — not just before launch. Prompt attack simulations test whether the current guardrails can withstand the latest documented injection and jailbreak techniques, including indirect attacks through retrieved content. Tool abuse testing attempts to exploit the agent’s tool access to achieve unauthorised outcomes — calling tools in unexpected sequences, passing malformed inputs, or chaining calls that individually appear benign but collectively exceed permitted scope. Safety regression checks verify that every update to the agent’s prompt, tools, or model does not inadvertently remove a previously effective safety control.

◆

Prompt attack simulations — automated adversarial testing against latest documented injection and jailbreak techniques on a continuous schedule

◆

Tool abuse testing — attempt unauthorised outcomes through unexpected tool call sequences and malformed inputs; verify guardrails hold

◆

Safety regression checks — verify every model or prompt update does not remove a previously effective safety control before promotion

CHAIN

Supply Chain

Supply Chain Security

Supply chain attacks targeting AI systems are ranked as the #1 threat category in the State of Agentic AI Security 2026 report. Help Net Security documented a fake npm package that mimicked a legitimate email integration and silently copied outbound messages to an attacker-controlled address — a supply chain attack that required no exploitation of the target agent’s security controls. Model Context Protocol (MCP) vulnerabilities have become a particular concern: Researchers identified tool poisoning, remote code execution flaws, overprivileged access, and supply chain tampering within MCP ecosystems. A GitHub MCP server was exploited via a malicious issue that injected hidden instructions, triggering data exfiltration from private repositories. Signed models and artifacts require cryptographic signatures on every model checkpoint, plugin, and dependency — unsigned artifacts are rejected at the registry level. Dependency integrity checks verify the cryptographic hash of every third-party component against a known-good manifest before it is loaded into the agent runtime. AI SBOM (Software Bill of Materials) tracking maintains a comprehensive, auditable inventory of every model, dataset, plugin, and dependency that makes up the agent’s operational stack — the equivalent of a software SBOM but for AI-specific components.

◆

Signed models & artifacts — cryptographic signatures required on all model checkpoints, plugins, and dependencies; unsigned artifacts rejected

◆

Dependency integrity checks — verify cryptographic hash of every third-party component against known-good manifest before loading

◆

AI SBOM tracking — comprehensive, auditable inventory of every model, dataset, plugin, and dependency in the agent’s operational stack

“Give agents an identity, scope their access, and audit what they do the same way you would any other actor in your environment. A CISO’s first move should be ensuring every agent has a managed identity with scoped authentication — not a shared API key with ‘god-mode’ access. If you can’t answer ‘What can this agent do?’, ‘On whose behalf?’, and ‘Who approved it?’ the same way you can for a human employee, you’re not ready for the autonomy these systems are about to have.”

Mike Gozzo — CISO Practitioner, via Bessemer Venture Partners: Securing AI Agents — The Defining Cybersecurity Challenge of 2026

All 12 Controls — Quick Reference

#	Control	Layer	Primary Threat Addressed	Outcome	Key Standards
01	AI Agent Identity	Identity	Shared credentials, privilege inheritance, unattributable agent actions	Zero-Trust	NIST AI RMF · SPIFFE
02	JIT Tool Access	Access	Standing access accumulation, long-lived credential theft, over-privileged tool grants	Blast Radius↓	OWASP LLM09 · PAM
03	Prompt Injection Defense	Input	Indirect injection via external content, system prompt extraction, goal hijacking	Safe Reasoning	OWASP LLM01 · NIST
04	Output & Data Protection	Output	PII leakage, credential exposure in logs, data exfiltration via agent responses	No Leaks	GDPR · HIPAA · OWASP
05	Risk-Based Action Control	Decision	High-impact actions executed without proportional oversight; flat authorisation model	Smart Auto	ISO 42001 · EU AI Act
06	Human-in-the-Loop	Oversight	Autonomous execution of irreversible actions; inability to halt rogue agents	Ctrl Autonomy	Deloitte AI Gov · EU AI Act
07	Sandbox Execution	Execution	Code execution escaping to host system; malware installation; persistent modification	Containment	E2B · CWE sandbox
08	Secure Memory Management	Memory	Memory poisoning, cross-session data bleed, PII accumulation in vector stores	Ltd Exposure	GDPR Art.17 · Mem0
09	Cross-Agent Isolation	Multi-Agent	Lateral movement via agent impersonation, session smuggling, implicit inter-agent trust	No Lateral	MCP · A2A · OWASP
10	Behavioral Monitoring	Runtime	Blind-spot exploitation, undetected privilege escalation, tool misuse between checks	Early Detect	SIEM/SOAR · Akto
11	Continuous Red Teaming	Adversarial	Stale defences; new attack patterns not covered by point-in-time testing	Evolving Sec	Anthropic GTG-1002 · Lakera
12	Supply Chain Security	Chain	Malicious plugins, poisoned models, fake packages, MCP server tampering	Trusted Stack	SBOM · SLSA · Sigstore

Security Posture Assessment

Build the Controls in Order. No Control Is Optional.

The 12 controls documented here are not a menu from which security teams can select their preferences — they are a complete defence-in-depth architecture where each layer assumes the others exist. Identity without audit logging is unattributable. JIT access without behavioral monitoring creates a window of undetected abuse. Sandbox execution without supply chain integrity gives attackers a clean starting point inside the sandbox. Every control depends on the structural integrity of the controls surrounding it.

The sequence matters for implementation. Start with identity — every other control builds on the ability to attribute agent actions to a specific, manageable identity. Add JIT access and behavioral monitoring before expanding tool access. Implement human-in-the-loop controls before deploying agents in financial or compliance-sensitive workflows. Add continuous red teaming the moment agents are in production — not as a pre-launch checkbox, but as an ongoing operational capability that evolves as threats evolve.

Bessemer Venture Partners is direct about the stakes: the CISOs who close the AI agent security gap deliberately, starting now, will define what enterprise AI looks like for the rest of the decade. 88% of organisations confirmed or suspected AI security incidents in 2025. The 12% that did not are the ones who built these controls first.

// ENTERPRISE AI AGENT SECURITY CHECKLIST — 2026

✓ identity Every agent has SPIFFE workload identity · Least-privilege roles · Full audit trail

✓ jit_access Per-task token grants · Auto-expiry on completion · No standing permissions

⚠ injection Runtime detection deployed · Context boundaries enforced · indirect attacks: review

✓ output_dlp PII scanning active · Response redaction · Destination allow-lists enforced

⚠ risk_ctrl Financial thresholds set · Sensitivity tagging: partial · Adaptive thresholds: pending

✓ hitl Payment approvals: human gate · Destructive actions: review required · Override validation: on

✗ sandbox Code execution isolation NOT configured · CRITICAL — address before next deployment

✓ memory Session-scoped recall · Encrypted storage · Retention policies: GDPR-aligned

⚠ isolation Signed communication: partial · Memory separation: pending full implementation

✓ monitoring Loop detection: active · Escalation alerts: SIEM integrated · Tool misuse tracking: on

⚠ red_team Point-in-time testing done · Continuous automated: not yet deployed

✗ supply_chain AI SBOM: not generated · Artifact signing: not enforced · CRITICAL

Sources: Bessemer Venture Partners — Securing AI Agents: The Defining Cybersecurity Challenge of 2026 · Gravitee — State of AI Agent Security 2026 Report (88% confirmed AI security incidents; 14.4% with full security approval) · Help Net Security — Enterprises Are Racing to Secure Agentic AI Deployments (February 2026) · Akto — State of Agentic AI Security 2025 (79% have security blind spots) · eSecurity Planet — AI Agent Attacks in Q4 2025 Signal New Risks for 2026 · NIST NCCoE — Concept Paper: Accelerating Adoption of Software and AI Agent Identity and Authorization (February 2026) · Obsidian Security — Security for AI Agents: Protecting Intelligent Systems · Microsoft Security Blog — Four Priorities for AI-Powered Identity and Network Access Security in 2026 (January 2026) · Hogan Lovells — Shaping the Future of AI Security: NIST Seeking Input on Agent Identity and Authorization · OWASP LLM Top 10 2025 · OWASP Agentic AI Security Top 10 (late 2025) · CyberArk — Every AI Agent Is an Identity (non-human identity analysis) · Dark Reading — 48% of cybersecurity professionals identify agentic AI as the single most dangerous attack vector · Cisco — State of AI Security 2025 Report (only 34% of enterprises have AI-specific security controls) · Gartner — 40% of enterprise applications will embed task-specific AI agents by 2026 · Deloitte — State of AI in the Enterprise 2026 (only one in five has mature governance for autonomous AI agents) · Seceon — Zero Trust AI Security: The Comprehensive Guide 2026 · Anthropic — GTG-1002 Technical Report: First AI-Orchestrated Cyber-Espionage Campaign (November 2025) · Lakera AI — Q4 2025 Prompt Injection Attack Analysis · ISO/IEC 42001:2023 AI Management System · NIST AI Risk Management Framework 1.0