15 AI Agent Patterns: The Complete Architecture Guide
Architecture Reference Single-Agent Multi-Agent Iterative HITL

15 AI Agent
Patterns

From a single agent reasoning before each tool call to hierarchical orchestrators managing fleets of specialists — these are the 15 architectural patterns that define how production AI agents are built in 2026. Three tiers. Every pattern you need. No pattern you don’t.

April 2026 · AI Architecture · 25 min read
Tier 1 · Single-Agent
Foundation Patterns
4 patterns · ReAct · Plan-Execute · Reflection · Tool Use
Tier 2 · Multi-Agent
Orchestration Patterns
5 patterns · Orchestrator · Supervisor · Fan-Out · MapReduce · Debate
Tier 3 · Iterative
Feedback Loop Patterns
6 patterns · Hierarchical · Pipeline · Evaluator · Critic · Self-Healing · HITL
The pattern is the architecture.
Frameworks change.
Patterns persist.
57%
of enterprise AI agent failures originate in orchestration design — not individual agent capability — Anthropic analysis, 200+ deployments 2025
43%
of enterprise agent deployments use LangGraph, making it the leading implementation framework for multi-agent patterns as of early 2026
85%
of quality improvement occurs in the first 2 iterations of an Evaluator-Optimizer loop. Beyond 3, gains are marginal and cost doubles — Anthropic 2025
40%
of enterprise applications will incorporate AI agents by 2026, up from <5% in 2025 — Gartner. Pattern choice determines who succeeds.
Why Patterns Matter

Frameworks Change. Patterns Persist.

The AI agent framework landscape has been in near-continuous churn since 2024. LangGraph, CrewAI, OpenAI Agents SDK, Google ADK, PydanticAI — each claims production readiness, each makes different architectural choices, and each will be superseded by something else within two years. The engineers who are building systems that last are not the ones who picked the right framework. They are the ones who mastered the underlying patterns that make agentic systems work.

A pattern is the reusable solution to a recurring design problem. ReAct is not a feature of any particular library — it is an architectural principle that any agent can implement. The Evaluator-Optimizer is not a LangGraph construct — it is a feedback loop that any two LLMs can instantiate. When your framework of choice changes its API or gets deprecated, the pattern survives. When you move from GPT-4o to Claude Sonnet to a local model, the pattern still applies.

The 15 patterns here are organized into three tiers reflecting increasing coordination complexity. Tier 1 patterns operate on a single agent. Tier 2 patterns coordinate multiple agents in parallel or hierarchical arrangements. Tier 3 patterns run iterative loops where output quality or system state determines whether execution continues. Real production systems typically compose two or three patterns within a single workflow — the art is knowing which combination addresses the specific failure mode you are trying to solve.

Tier 1 · Single-Agent
Foundation Patterns
One agent, one model, one context. These four patterns are the building blocks that all more complex architectures extend. Master them before adding agents.
01
ReAct
Reason + Act · interleaved loop
Observation Reasoning Action Observation
ReAct (Reasoning + Acting) interleaves a reasoning trace with each action, so the agent thinks before it acts and reflects on the result before it thinks again. Rather than executing a sequence of tool calls blindly, the agent generates explicit “thought” steps that explain its intention, then executes the action, then observes the result. The cycle repeats until the task is complete or a stopping condition is met. ReAct is the foundational pattern behind most tool-calling agents in production today — it provides traceability (you can read what the agent was thinking), error recovery (a bad tool result surfaces in the reasoning step), and adaptive decision-making (each observation changes the reasoning chain).
Best Use Cases
Research agents that must decide which search queries to run next based on what they have found
Customer support agents navigating multi-system lookups where the path depends on each result
Debugging assistants that read code, run tests, interpret results, and iterate
Trade-off Adds latency per tool call because every action is preceded by a reasoning step. In high-frequency, low-complexity tasks, this overhead is unwarranted. Use ReAct when reasoning quality matters more than throughput.
02
Plan-and-Execute
Upfront planning · sequential execution
Planner Step 1 Step 2 Step N
Plan-and-Execute separates high-level strategic planning from tactical execution. A planner model — typically a high-reasoning model like o3 or Claude Opus — receives the full task and generates a directed acyclic graph (DAG) of subtasks before execution begins. Smaller, faster executor models then run each step sequentially or in parallel. This architectural split enables cost optimisation: expensive reasoning is front-loaded into one high-quality plan; the grunt work of execution runs on cheaper, faster models. Benchmarks show Plan-and-Execute architectures achieving up to 92% task completion with a 3.6× speedup over sequential ReAct for long-horizon tasks. The key failure mode is plan brittleness: if step 3 fails, a naive implementation has no mechanism to re-plan without starting over.
Best Use Cases
Long-horizon tasks with predictable decomposition (software development, report generation)
Multi-step data transformation pipelines with well-defined intermediate outputs
Workflows where cost optimisation requires separating planning from execution models
Trade-off Plans become stale when early steps produce unexpected results. Combine with a Re-Planner component that re-evaluates the plan after each execution step fails or returns significantly unexpected output.
03
Reflection / Self-Critique
Generate → critique → refine
Generate Critique Refine Generate
The Reflection pattern instructs the agent to review its own output against defined criteria and iterate until satisfied. After generating an initial response, the agent enters a self-critique phase — evaluating correctness, completeness, quality, and conformance to requirements — then refines based on that critique. Research demonstrates that reflection can improve performance on coding benchmarks like HumanEval from 80% to 91%. The self-critique can use the same model (cheaper, but potentially echo-chamber effects) or a separate model (more expensive, more independent). Microsoft’s Azure AI documentation uses this pattern for content moderation, where multiple prompts evaluate different aspects with different vote thresholds to balance false positives and negatives. Fujitsu used reflection in compliance and finance workflows to reduce human review load significantly.
Best Use Cases
Code generation where systematic quality criteria (style, correctness, security) can be evaluated programmatically
Document drafting where the agent can check its own output against a rubric
Translation tasks where nuance and tone can be assessed in a structured critique
Trade-off Self-critique using the same model risks blind spots — the model may fail to identify its own errors. Cap iterations at 2–3 (85% of improvement is in the first 2). Beyond that, add escalation to an independent evaluator or human review.
04
Tool Use / Function Calling
Agent selects and invokes external tools
Task Tool Selection API / DB / Code Result
Tool Use is the foundational capability that separates a language model from an AI agent. The agent is given a schema of available tools — functions, APIs, database queries, code executors, web browsers — and decides which tool to invoke, when, and with what parameters. Anthropic’s principle here is critical: expose the minimal tool surface required for the task. Every tool that the agent can call is a surface for hallucinated invocations, adversarial injection, and unintended side effects. Well-designed tool schemas with explicit documentation, parameter constraints, and expected output formats dramatically reduce misuse. MCP (Model Context Protocol) has standardised tool access as a protocol in 2026, allowing a single tool registration to be consumed by any framework.
Best Use Cases
Any agent that needs to ground its outputs in real-world data (databases, search, APIs)
Code execution agents that run and test the code they generate
Integration agents that must create tickets, send emails, update records
Trade-off Tool proliferation is a failure mode. An agent with 50 tools in its schema will mis-select far more often than one with 5. Specialise agents around minimal tool sets; use orchestration patterns to route to the right specialist rather than giving one agent everything.
Tier 2 · Multi-Agent Orchestration
Coordination Patterns
When one agent is not enough. These five patterns define how multiple agents coordinate, communicate, and divide work — each addressing a different class of coordination challenge.
05
Orchestrator-Subagent
Central coordinator · specialist workers
Orchestrator Subagent A + Subagent B Result
The Orchestrator-Subagent pattern is the foundational multi-agent architecture. A central orchestrator LLM receives the task, dynamically breaks it into subtasks, delegates each subtask to a specialised worker agent, and synthesises their results. Anthropic’s definition emphasises the orchestrator’s adaptive quality: unlike Plan-and-Execute, the subtasks are not pre-defined — they are determined by the orchestrator based on the specific input. Subtasks that are not predictable from the task description make this pattern the right choice. The Fujitsu example is canonical: an orchestrator delegating market research, data analysis, and document creation to three specialists to assemble full sales proposals — reducing production time by 67%. Worker agents maintain focused, minimal tool sets for their domain, reducing hallucination and misuse risks.
Best Use Cases
Complex tasks with unpredictable subtask structure (software that touches multiple files)
Business workflows requiring heterogeneous expertise (research + analysis + writing)
Any workflow where specialised, focused agents outperform generalist agents
Trade-off Orchestrator quality determines system quality. Anthropic’s multi-agent benchmark found orchestrators with explicit routing criteria outperform implicit ones by 31% on task completion. Define routing rules precisely in the orchestrator’s system prompt — never rely on “use the right agent.”
06
Supervisor
Routes · monitors · enforces quality gates
Supervisor Route → Specialist Evaluate Accept / Retry
The Supervisor pattern extends orchestration with explicit quality control. A supervisor controller routes tasks to appropriate specialists, monitors the outputs they return, and enforces quality gates before accepting or rejecting results. Where the orchestrator delegates and synthesises, the supervisor also polices: if a specialist’s output fails the quality threshold, the supervisor can re-route, request a revision, or escalate. This maps naturally to a production engineering team structure where a tech lead reviews all output before it ships. The supervisor’s system prompt must include the full list of available specialists, explicit routing criteria, quality standards for evaluating outputs, and escalation rules for unacceptable results — all specified precisely, not implied.
Best Use Cases
Content production pipelines where quality is non-negotiable (publishing, compliance documentation)
Customer-facing workflows where subagent errors must be caught before reaching users
Multi-specialist systems where the supervisor must balance competing specialist outputs
Trade-off Adding a supervisor adds a full LLM call to every task cycle. For high-throughput, low-risk workflows, this overhead is unjustified. Reserve supervisor patterns for workflows where the cost of quality failure exceeds the cost of the extra inference call.
07
Parallel Fan-Out / Fan-In
Split → concurrent execution → merge
Task Fan-Out A+B+C (parallel) Fan-In
Parallel Fan-Out disperses independent subtasks across multiple agents simultaneously, while Fan-In aggregates their results into a single synthesised output. Unlike sequential orchestration, fan-out cuts wall-clock time dramatically: the total latency is bounded by the slowest parallel worker rather than the sum of all workers. This is the pattern that makes large-scale research pipelines, document processing workloads, and multi-perspective analysis tractable. The critical precondition is that subtasks must be independent — no subtask can depend on another subtask’s output. If dependencies exist, some subtasks must be sequenced first. Azure’s documentation uses city park development proposals as an example: multiple specialist agents evaluate different community impact perspectives simultaneously, with their analyses merged before community review.
Best Use Cases
Multi-perspective analysis where different agents evaluate the same content independently
Content moderation requiring multiple concurrent safety checks
Research tasks requiring simultaneous retrieval from multiple independent sources
Trade-off Parallel LLM calls multiply cost proportionally to the number of concurrent agents — latency improves, but cost does not. Only appropriate when the latency reduction justifies the spend. Use when tasks are genuinely independent and time is the binding constraint.
08
MapReduce
Distribute → process → aggregate
Input Map (N agents) Reduce Output
MapReduce adapts the classic distributed computing pattern for AI agent workflows. The Map phase distributes a dataset or task corpus across N agents, each processing their chunk independently. The Reduce phase aggregates the N partial results into a single coherent output. This pattern excels when a task exceeds a single context window, or when processing scale exceeds what any single agent can handle in acceptable time. A 1,000-document legal analysis task that would take a single agent hours can be processed by 50 agents in parallel with a final reducer summarising findings. The MapReduce pattern is distinguished from Fan-Out/Fan-In by its emphasis on homogeneous task distribution over a dataset rather than heterogeneous task specialisation.
Best Use Cases
Large document corpus analysis (legal review, compliance auditing, research synthesis)
Data extraction across a large dataset where each record requires independent processing
Sentiment analysis or classification over datasets too large for a single context window
Trade-off Reducer quality determines output quality. If the map agents produce heterogeneous or inconsistent output formats, the reducer’s synthesis task becomes brittle. Define output schemas for map agents precisely so the reducer receives homogeneous inputs it can reliably aggregate.
09
Debate / Adversarial
Agents argue · judge resolves
Agent A (for) Agent B (against) Judge
The Debate pattern assigns agents to advocate for opposing positions on a question, then uses a judge agent to evaluate the arguments and render a final decision. This deliberately adversarial structure is designed to overcome the echo-chamber problem: a single agent reviewing its own reasoning tends to confirm rather than challenge. By forcing explicit counter-argument construction, the pattern surfaces weaknesses in each position that a collaborative or consensus-seeking approach would suppress. Microsoft’s Azure architecture documentation shows a city planning example where agents debate community impact perspectives before a proposal opens for public review — anticipating feedback and strengthening the proposal through structured conflict rather than collaborative agreement.
Best Use Cases
High-stakes decisions where one-sided reasoning is a risk (investment decisions, legal strategy)
Policy analysis requiring comprehensive evaluation of opposing perspectives
Identifying weaknesses in plans or proposals before external scrutiny
Trade-off Debate patterns consume significantly more tokens and time than single-agent reasoning. Not suitable for time-sensitive, high-throughput, or low-stakes tasks. Best reserved for decision points where adversarial challenge genuinely improves outcome quality versus a reflection loop.

“Designing agent control flow is now the highest-leverage skill in AI engineering. The orchestration layer is where most enterprise agent projects succeed or fail. Agents were individually capable but poorly coordinated — that gap is where 57% of failures originate.”

Anthropic — Enterprise Agent Deployment Patterns Analysis, 2025
Tier 3 · Iterative & Feedback Loop
Refinement Patterns
Patterns that run until a condition is met. These six patterns govern how agents improve output over iterations, recover from failure, and hand off to humans when required.
10
Hierarchical Agents
Orchestrators managing orchestrators
L1 Orchestrator L2 Orchestrators L3 Workers
Hierarchical Agents extend the orchestrator pattern across multiple levels of management. A top-level orchestrator breaks high-level goals into domains and delegates to mid-level orchestrators, each of which manages their own pool of specialised workers. This mirrors enterprise organisational structures — where a division head delegates to team leads who manage individual contributors. LangGraph’s “Hierarchical + Router” combination is the most common production implementation: each level in the hierarchy uses a router to dispatch work to the appropriate child, reducing misrouting at each level. This pattern becomes necessary when the coordination complexity of a single orchestrator’s task space exceeds what can be managed in one agent’s context.
Best Use Cases
Enterprise automation systems spanning multiple business domains (HR, finance, operations)
Multi-project software engineering agents managing code across different repos and teams
Research platforms coordinating independent research threads across knowledge domains
Trade-off Every additional hierarchy level adds latency and cost. Most systems do not require more than two levels. Introduce hierarchy only when a single orchestrator’s coordination task becomes too complex to handle accurately — not as a default architectural choice.
11
Sequential Pipeline
Each agent hands off to the next
Agent A Agent B Agent C Output
The Sequential Pipeline passes task output from one specialised agent to the next in a defined sequence, with each agent building on what the previous produced. This is the simplest multi-agent pattern: deterministic, auditable, and linear. Databricks documents this as their highest-predictability pattern with lowest latency per agent because there are fewer LLM calls for orchestration decisions — each agent simply processes input and passes output forward. A content pipeline might sequence: researcher → drafter → editor → fact-checker → publisher. Each agent is optimised for its single role with a focused tool set, making the system more reliable and easier to debug than a single generalist agent performing all roles.
Best Use Cases
Content production pipelines with defined stage gates
Data transformation workflows where each stage enriches or transforms the previous output
Any workflow where the sequence of stages is fixed and stages have clear dependencies
Trade-off A failure in any stage blocks the entire pipeline. Error handling at each stage transition is critical — without retry logic or fallback paths, one bad agent output stalls the whole workflow. Combine with the Self-Healing pattern at each stage junction for production resilience.
12
Evaluator-Optimizer
Generate → score → refine → loop
Generator Evaluator Pass? ↺ No Generator
The Evaluator-Optimizer separates the agent that generates output (Generator) from the agent that scores it against explicit criteria (Evaluator). The Evaluator uses rubrics, reference outputs, or LLM-as-judge scoring to assess quality and provide structured feedback. The Optimizer (or Generator in its refinement mode) incorporates this feedback in the next generation cycle. The loop runs until the Evaluator scores the output as passing or a maximum iteration count is reached. This is the agentic equivalent of test-driven development — the acceptance criteria are defined before generation, and generation runs until acceptance is achieved. Anthropic’s content engine uses this pattern to catch 73% of quality issues that would otherwise require human intervention.
Best Use Cases
Literary translation where an evaluator can assess nuance that the translator may miss
Complex search tasks requiring multiple retrieval-analysis rounds with a quality gate
Any task where output quality can be scored objectively against defined criteria
Trade-off 85% of quality improvement occurs in the first 2 iterations (Anthropic research). Cap at 3–5 iterations maximum and escalate to human review if the threshold is not met. Without a cap, generators in infinite loops produce lateral changes rather than genuine improvements.
13
Critic-Actor
Structured feedback · actor refines
Actor Output Critic Feedback → Actor
The Critic-Actor pattern assigns two distinct roles: the Actor generates the primary output (code, content, plan, decision); the Critic provides structured, targeted feedback on specific weaknesses. The Actor then refines its output based on the Critic’s guidance. Unlike the Evaluator-Optimizer, the Critic’s role is not simply to pass/fail but to provide actionable, structured feedback that guides the Actor’s revision. This mirrors a human author-editor relationship: the editor does not just say “this is bad” but identifies specifically why and how it should be improved. The Critic-Actor loop runs until the Actor’s output clears the Critic’s bar or a maximum iteration threshold is reached.
Best Use Cases
Code review cycles where the critic identifies specific bugs, style violations, or security issues
Strategic plan refinement where the critic challenges assumptions and identifies gaps
Any iterative refinement where the quality of feedback determines the quality of refinement
Trade-off Critic quality is the binding constraint. A critic that produces vague or overly general feedback enables only superficial actor improvements. Invest in the critic’s evaluation rubric — it is the intellectual core of this pattern, not the actor’s generation capability.
14
Self-Healing / Retry Loop
Diagnose failure · correct strategy · retry
Attempt Failure? Diagnose Corrected Retry
The Self-Healing pattern equips an agent with the ability to diagnose its own failures and retry with a corrected strategy rather than simply re-executing the same failed action. When a tool call fails, an API returns an error, or the agent reaches a dead end, the self-healing loop kicks in: the agent analyses the error, identifies what went wrong and why, and formulates a different approach for the retry. This is qualitatively different from naive exponential backoff — it is intelligent failure recovery. The SRE automation case in Azure’s architecture guide is instructive: when a service outage occurs, the system creates and implements a remediation plan dynamically, without knowing the specific steps upfront, diagnosing and adapting until the live-site issue is resolved.
Best Use Cases
Any production agent that calls external tools or APIs that can fail in multiple ways
Infrastructure automation where recovery paths vary based on the specific failure type
Long-running workflows that cannot be restarted from scratch on every failure
Trade-off Without a maximum retry count and a meaningful escalation path, self-healing loops can run indefinitely, consuming tokens and budget. Define: maximum attempts, a circuit-breaker condition, and a clear escalation path (to HITL or to a fallback chain) when the maximum is reached.
15
HITL — Human-in-the-Loop
Human checkpoints · approve · correct · redirect
Agent Action HITL Gate Approve / Correct Continue
HITL is not a fallback — it is a deliberate architectural choice. A human steps in at predefined checkpoints to approve, correct, or redirect agent behaviour before execution continues. These gates are typically placed at: high-stakes irreversible actions (delete, publish, transact), quality thresholds below which automated evaluation cannot be trusted, and escalation points when automated loops exceed iteration limits. Microsoft’s Azure framework explicitly distinguishes between mandatory HITL gates (which make the orchestration synchronous at that step and must checkpoint state for resumption) and optional gates (where human input can improve quality but is not required). The Databricks guidance frames this as “OS-level permissions — a sudo prompt: high-stakes agent actions automatically route to humans for approval.”
Best Use Cases
Any irreversible action with material business consequences (financial transactions, bulk deletions)
Regulated workflows where human accountability is legally required
Evaluator-Optimizer loops that have not passed quality thresholds after maximum iterations
Trade-off Mandatory HITL gates make the workflow synchronous at that step — latency depends on human response time. Persist agent state at every HITL checkpoint so the workflow can resume without replaying prior work when the human responds. Never require human re-entry of context the agent already has.
Quick Reference

Pattern Decision Matrix

Match your design constraint to the pattern. Real systems typically combine 2–3 from different tiers.

If you need… Use Pattern Tier Primary Benefit Watch Out For
Agent that thinks before every tool call ReAct T1 Traceability + adaptive reasoning Per-call latency overhead
Long task with predictable decomposition Plan-and-Execute T1 3.6× speedup, cost-split planning/exec Plan brittleness on step failure
Improve output quality through iteration Reflection T1 +11% coding benchmark improvement Echo-chamber self-critique
Ground outputs in real-world data or actions Tool Use T1 Factual grounding + real-world action Tool proliferation → hallucination
Complex task with unpredictable subtasks Orchestrator-Subagent T2 Flexible delegation, specialised workers Implicit routing → 31% failure drop
Quality control with specialist routing Supervisor T2 Quality gate before user delivery Extra inference per cycle
Independent tasks, latency is the constraint Fan-Out / Fan-In T2 Latency = slowest agent, not sum Cost multiplies with agent count
Large corpus, scale beyond single context MapReduce T2 Horizontal scale over large datasets Reducer needs homogeneous inputs
High-stakes decisions needing stress-testing Debate / Adversarial T2 Surfaces blind spots, breaks echo-chamber High token and time cost
Multi-domain system needing nested delegation Hierarchical Agents T3 Scalable complexity management Each level adds latency
Well-defined workflow with clear stage sequence Sequential Pipeline T3 Highest predictability and auditability One stage failure blocks whole pipeline
Output quality must meet a defined threshold Evaluator-Optimizer T3 73% of quality issues caught automatically Cap at 3 iterations; escalate beyond
Output needs targeted, structured feedback Critic-Actor T3 Specific feedback guides refinement Critic rubric quality is the bottleneck
Production agents that must recover from failure Self-Healing / Retry T3 Intelligent error recovery, not dumb retry Circuit breaker + escalation required
Irreversible action or regulatory accountability HITL T3 Human accountability at critical gates Checkpoint state — never lose context
Engineering Principle

Start Simple. Add Patterns When Failures Demand Them.

The most reliable guidance from every production deployment of AI agents in 2025 and 2026 is the same: start with the simplest pattern that addresses the core problem, then layer additional patterns only when a specific failure mode demands it. Tool Use plus ReAct handles a remarkable proportion of real-world agent tasks. The Evaluator-Optimizer adds quality assurance when output consistency matters. HITL adds human accountability when irreversibility or regulation demands it. Each pattern adds coordination complexity and coordination failure risk alongside whatever problem it solves.

The engineers who over-architect agents — reaching for Hierarchical Orchestrators and Debate/Adversarial loops before they have validated that a single-agent ReAct loop fails — are spending engineering budget and operational complexity on problems they have not confirmed exist. The engineers who under-architect — deploying a plain chatbot where a Supervisor with quality gates was required — are handing users inconsistent outputs without recourse.

The 15 patterns here are a vocabulary, not a checklist. You do not need all 15. You need the 2–3 that match your actual coordination and quality problems. The Anthropic principle that guides all of this is worth internalising as a default: maintain simplicity, prioritise transparency by showing planning steps, and build only what your actual failure modes demand.

Mastering a handful of composable design patterns matters far more than mastering any single framework. Frameworks change. Patterns persist. The pattern is the architecture — the framework is just the scaffolding you hang it on.

Sources: Anthropic — Building Effective Agents (Schluntz & Zhang) · Anthropic — Enterprise Agent Deployment Patterns 2025 · Microsoft Azure Architecture Center — AI Agent Orchestration Patterns · Databricks — Agent System Design Patterns · SitePoint — Agentic Design Patterns: The 2026 Guide · The Thinking Company — AI Agent Orchestration Patterns 2026 · n1n.ai — 5 AI Agent Design Patterns to Master by 2026 · Microsoft Azure Blog — Agent Factory: Common Use Cases and Design Patterns · Spring.io — Building Effective Agents with Spring AI · GuruSup — Best Multi-Agent Frameworks 2026 · Langfuse — Framework Comparison 2026 · Gartner — Agentic AI as Top Strategic Technology Trend 2025