15 AI Agent Patterns: The Complete Architecture Guide

Architecture Reference Single-Agent Multi-Agent Iterative HITL

15 AI Agent
Patterns

From a single agent reasoning before each tool call to hierarchical orchestrators managing fleets of specialists — these are the 15 architectural patterns that define how production AI agents are built in 2026. Three tiers. Every pattern you need. No pattern you don’t.

April 2026 · AI Architecture · 25 min read

Tier 1 · Single-Agent

Foundation Patterns

4 patterns · ReAct · Plan-Execute · Reflection · Tool Use

Tier 2 · Multi-Agent

Orchestration Patterns

5 patterns · Orchestrator · Supervisor · Fan-Out · MapReduce · Debate

Tier 3 · Iterative

Feedback Loop Patterns

6 patterns · Hierarchical · Pipeline · Evaluator · Critic · Self-Healing · HITL

        The pattern is the architecture.
Frameworks change.
Patterns persist.
      

57%

of enterprise AI agent failures originate in orchestration design — not individual agent capability — Anthropic analysis, 200+ deployments 2025

43%

of enterprise agent deployments use LangGraph, making it the leading implementation framework for multi-agent patterns as of early 2026

85%

of quality improvement occurs in the first 2 iterations of an Evaluator-Optimizer loop. Beyond 3, gains are marginal and cost doubles — Anthropic 2025

40%

of enterprise applications will incorporate AI agents by 2026, up from <5% in 2025 — Gartner. Pattern choice determines who succeeds.

Why Patterns Matter

Frameworks Change. Patterns Persist.

The AI agent framework landscape has been in near-continuous churn since 2024. LangGraph, CrewAI, OpenAI Agents SDK, Google ADK, PydanticAI — each claims production readiness, each makes different architectural choices, and each will be superseded by something else within two years. The engineers who are building systems that last are not the ones who picked the right framework. They are the ones who mastered the underlying patterns that make agentic systems work.

A pattern is the reusable solution to a recurring design problem. ReAct is not a feature of any particular library — it is an architectural principle that any agent can implement. The Evaluator-Optimizer is not a LangGraph construct — it is a feedback loop that any two LLMs can instantiate. When your framework of choice changes its API or gets deprecated, the pattern survives. When you move from GPT-4o to Claude Sonnet to a local model, the pattern still applies.

The 15 patterns here are organized into three tiers reflecting increasing coordination complexity. Tier 1 patterns operate on a single agent. Tier 2 patterns coordinate multiple agents in parallel or hierarchical arrangements. Tier 3 patterns run iterative loops where output quality or system state determines whether execution continues. Real production systems typically compose two or three patterns within a single workflow — the art is knowing which combination addresses the specific failure mode you are trying to solve.

Tier 1 · Single-Agent

Foundation Patterns

One agent, one model, one context. These four patterns are the building blocks that all more complex architectures extend. Master them before adding agents.

ReAct

Reason + Act · interleaved loop

Observation → Reasoning → Action → Observation

ReAct (Reasoning + Acting) interleaves a reasoning trace with each action, so the agent thinks before it acts and reflects on the result before it thinks again. Rather than executing a sequence of tool calls blindly, the agent generates explicit “thought” steps that explain its intention, then executes the action, then observes the result. The cycle repeats until the task is complete or a stopping condition is met. ReAct is the foundational pattern behind most tool-calling agents in production today — it provides traceability (you can read what the agent was thinking), error recovery (a bad tool result surfaces in the reasoning step), and adaptive decision-making (each observation changes the reasoning chain).

Best Use Cases

Research agents that must decide which search queries to run next based on what they have found

Customer support agents navigating multi-system lookups where the path depends on each result

Debugging assistants that read code, run tests, interpret results, and iterate

Trade-off Adds latency per tool call because every action is preceded by a reasoning step. In high-frequency, low-complexity tasks, this overhead is unwarranted. Use ReAct when reasoning quality matters more than throughput.

Plan-and-Execute

Upfront planning · sequential execution

Planner → Step 1 → Step 2 → Step N

Plan-and-Execute separates high-level strategic planning from tactical execution. A planner model — typically a high-reasoning model like o3 or Claude Opus — receives the full task and generates a directed acyclic graph (DAG) of subtasks before execution begins. Smaller, faster executor models then run each step sequentially or in parallel. This architectural split enables cost optimisation: expensive reasoning is front-loaded into one high-quality plan; the grunt work of execution runs on cheaper, faster models. Benchmarks show Plan-and-Execute architectures achieving up to 92% task completion with a 3.6× speedup over sequential ReAct for long-horizon tasks. The key failure mode is plan brittleness: if step 3 fails, a naive implementation has no mechanism to re-plan without starting over.

Best Use Cases

Long-horizon tasks with predictable decomposition (software development, report generation)

Multi-step data transformation pipelines with well-defined intermediate outputs

Workflows where cost optimisation requires separating planning from execution models

Trade-off Plans become stale when early steps produce unexpected results. Combine with a Re-Planner component that re-evaluates the plan after each execution step fails or returns significantly unexpected output.

Reflection / Self-Critique

Generate → critique → refine

Generate → Critique → Refine ↺ Generate

The Reflection pattern instructs the agent to review its own output against defined criteria and iterate until satisfied. After generating an initial response, the agent enters a self-critique phase — evaluating correctness, completeness, quality, and conformance to requirements — then refines based on that critique. Research demonstrates that reflection can improve performance on coding benchmarks like HumanEval from 80% to 91%. The self-critique can use the same model (cheaper, but potentially echo-chamber effects) or a separate model (more expensive, more independent). Microsoft’s Azure AI documentation uses this pattern for content moderation, where multiple prompts evaluate different aspects with different vote thresholds to balance false positives and negatives. Fujitsu used reflection in compliance and finance workflows to reduce human review load significantly.

Best Use Cases

Code generation where systematic quality criteria (style, correctness, security) can be evaluated programmatically

Document drafting where the agent can check its own output against a rubric

Translation tasks where nuance and tone can be assessed in a structured critique

Trade-off Self-critique using the same model risks blind spots — the model may fail to identify its own errors. Cap iterations at 2–3 (85% of improvement is in the first 2). Beyond that, add escalation to an independent evaluator or human review.

Tool Use / Function Calling

Agent selects and invokes external tools

Task → Tool Selection → API / DB / Code → Result

Tool Use is the foundational capability that separates a language model from an AI agent. The agent is given a schema of available tools — functions, APIs, database queries, code executors, web browsers — and decides which tool to invoke, when, and with what parameters. Anthropic’s principle here is critical: expose the minimal tool surface required for the task. Every tool that the agent can call is a surface for hallucinated invocations, adversarial injection, and unintended side effects. Well-designed tool schemas with explicit documentation, parameter constraints, and expected output formats dramatically reduce misuse. MCP (Model Context Protocol) has standardised tool access as a protocol in 2026, allowing a single tool registration to be consumed by any framework.

Best Use Cases

Any agent that needs to ground its outputs in real-world data (databases, search, APIs)

Code execution agents that run and test the code they generate

Integration agents that must create tickets, send emails, update records

Trade-off Tool proliferation is a failure mode. An agent with 50 tools in its schema will mis-select far more often than one with 5. Specialise agents around minimal tool sets; use orchestration patterns to route to the right specialist rather than giving one agent everything.

Tier 2 · Multi-Agent Orchestration

Coordination Patterns

When one agent is not enough. These five patterns define how multiple agents coordinate, communicate, and divide work — each addressing a different class of coordination challenge.

Orchestrator-Subagent

Central coordinator · specialist workers

Orchestrator → Subagent A + Subagent B → Result

The Orchestrator-Subagent pattern is the foundational multi-agent architecture. A central orchestrator LLM receives the task, dynamically breaks it into subtasks, delegates each subtask to a specialised worker agent, and synthesises their results. Anthropic’s definition emphasises the orchestrator’s adaptive quality: unlike Plan-and-Execute, the subtasks are not pre-defined — they are determined by the orchestrator based on the specific input. Subtasks that are not predictable from the task description make this pattern the right choice. The Fujitsu example is canonical: an orchestrator delegating market research, data analysis, and document creation to three specialists to assemble full sales proposals — reducing production time by 67%. Worker agents maintain focused, minimal tool sets for their domain, reducing hallucination and misuse risks.

Best Use Cases

Complex tasks with unpredictable subtask structure (software that touches multiple files)

Business workflows requiring heterogeneous expertise (research + analysis + writing)

Any workflow where specialised, focused agents outperform generalist agents

Trade-off Orchestrator quality determines system quality. Anthropic’s multi-agent benchmark found orchestrators with explicit routing criteria outperform implicit ones by 31% on task completion. Define routing rules precisely in the orchestrator’s system prompt — never rely on “use the right agent.”

Supervisor

Routes · monitors · enforces quality gates

Supervisor → Route → Specialist → Evaluate → Accept / Retry

The Supervisor pattern extends orchestration with explicit quality control. A supervisor controller routes tasks to appropriate specialists, monitors the outputs they return, and enforces quality gates before accepting or rejecting results. Where the orchestrator delegates and synthesises, the supervisor also polices: if a specialist’s output fails the quality threshold, the supervisor can re-route, request a revision, or escalate. This maps naturally to a production engineering team structure where a tech lead reviews all output before it ships. The supervisor’s system prompt must include the full list of available specialists, explicit routing criteria, quality standards for evaluating outputs, and escalation rules for unacceptable results — all specified precisely, not implied.

Best Use Cases

Content production pipelines where quality is non-negotiable (publishing, compliance documentation)

Customer-facing workflows where subagent errors must be caught before reaching users

Multi-specialist systems where the supervisor must balance competing specialist outputs

Trade-off Adding a supervisor adds a full LLM call to every task cycle. For high-throughput, low-risk workflows, this overhead is unjustified. Reserve supervisor patterns for workflows where the cost of quality failure exceeds the cost of the extra inference call.

Parallel Fan-Out / Fan-In

Split → concurrent execution → merge

Task → Fan-Out → A+B+C (parallel) → Fan-In

Parallel Fan-Out disperses independent subtasks across multiple agents simultaneously, while Fan-In aggregates their results into a single synthesised output. Unlike sequential orchestration, fan-out cuts wall-clock time dramatically: the total latency is bounded by the slowest parallel worker rather than the sum of all workers. This is the pattern that makes large-scale research pipelines, document processing workloads, and multi-perspective analysis tractable. The critical precondition is that subtasks must be independent — no subtask can depend on another subtask’s output. If dependencies exist, some subtasks must be sequenced first. Azure’s documentation uses city park development proposals as an example: multiple specialist agents evaluate different community impact perspectives simultaneously, with their analyses merged before community review.

Best Use Cases

Multi-perspective analysis where different agents evaluate the same content independently

Content moderation requiring multiple concurrent safety checks

Research tasks requiring simultaneous retrieval from multiple independent sources

Trade-off Parallel LLM calls multiply cost proportionally to the number of concurrent agents — latency improves, but cost does not. Only appropriate when the latency reduction justifies the spend. Use when tasks are genuinely independent and time is the binding constraint.

MapReduce

Distribute → process → aggregate

Input → Map (N agents) → Reduce → Output

MapReduce adapts the classic distributed computing pattern for AI agent workflows. The Map phase distributes a dataset or task corpus across N agents, each processing their chunk independently. The Reduce phase aggregates the N partial results into a single coherent output. This pattern excels when a task exceeds a single context window, or when processing scale exceeds what any single agent can handle in acceptable time. A 1,000-document legal analysis task that would take a single agent hours can be processed by 50 agents in parallel with a final reducer summarising findings. The MapReduce pattern is distinguished from Fan-Out/Fan-In by its emphasis on homogeneous task distribution over a dataset rather than heterogeneous task specialisation.

Best Use Cases

Large document corpus analysis (legal review, compliance auditing, research synthesis)

Data extraction across a large dataset where each record requires independent processing

Sentiment analysis or classification over datasets too large for a single context window

Trade-off Reducer quality determines output quality. If the map agents produce heterogeneous or inconsistent output formats, the reducer’s synthesis task becomes brittle. Define output schemas for map agents precisely so the reducer receives homogeneous inputs it can reliably aggregate.

Debate / Adversarial

Agents argue · judge resolves

Agent A (for) ↔ Agent B (against) → Judge

The Debate pattern assigns agents to advocate for opposing positions on a question, then uses a judge agent to evaluate the arguments and render a final decision. This deliberately adversarial structure is designed to overcome the echo-chamber problem: a single agent reviewing its own reasoning tends to confirm rather than challenge. By forcing explicit counter-argument construction, the pattern surfaces weaknesses in each position that a collaborative or consensus-seeking approach would suppress. Microsoft’s Azure architecture documentation shows a city planning example where agents debate community impact perspectives before a proposal opens for public review — anticipating feedback and strengthening the proposal through structured conflict rather than collaborative agreement.

Best Use Cases

High-stakes decisions where one-sided reasoning is a risk (investment decisions, legal strategy)

Policy analysis requiring comprehensive evaluation of opposing perspectives

Identifying weaknesses in plans or proposals before external scrutiny

Trade-off Debate patterns consume significantly more tokens and time than single-agent reasoning. Not suitable for time-sensitive, high-throughput, or low-stakes tasks. Best reserved for decision points where adversarial challenge genuinely improves outcome quality versus a reflection loop.

“Designing agent control flow is now the highest-leverage skill in AI engineering. The orchestration layer is where most enterprise agent projects succeed or fail. Agents were individually capable but poorly coordinated — that gap is where 57% of failures originate.”

Anthropic — Enterprise Agent Deployment Patterns Analysis, 2025

Tier 3 · Iterative & Feedback Loop

Refinement Patterns

Patterns that run until a condition is met. These six patterns govern how agents improve output over iterations, recover from failure, and hand off to humans when required.

Hierarchical Agents

Orchestrators managing orchestrators

L1 Orchestrator → L2 Orchestrators → L3 Workers

Hierarchical Agents extend the orchestrator pattern across multiple levels of management. A top-level orchestrator breaks high-level goals into domains and delegates to mid-level orchestrators, each of which manages their own pool of specialised workers. This mirrors enterprise organisational structures — where a division head delegates to team leads who manage individual contributors. LangGraph’s “Hierarchical + Router” combination is the most common production implementation: each level in the hierarchy uses a router to dispatch work to the appropriate child, reducing misrouting at each level. This pattern becomes necessary when the coordination complexity of a single orchestrator’s task space exceeds what can be managed in one agent’s context.

Best Use Cases

Enterprise automation systems spanning multiple business domains (HR, finance, operations)

Multi-project software engineering agents managing code across different repos and teams

Research platforms coordinating independent research threads across knowledge domains

Trade-off Every additional hierarchy level adds latency and cost. Most systems do not require more than two levels. Introduce hierarchy only when a single orchestrator’s coordination task becomes too complex to handle accurately — not as a default architectural choice.

Sequential Pipeline

Each agent hands off to the next

Agent A → Agent B → Agent C → Output

The Sequential Pipeline passes task output from one specialised agent to the next in a defined sequence, with each agent building on what the previous produced. This is the simplest multi-agent pattern: deterministic, auditable, and linear. Databricks documents this as their highest-predictability pattern with lowest latency per agent because there are fewer LLM calls for orchestration decisions — each agent simply processes input and passes output forward. A content pipeline might sequence: researcher → drafter → editor → fact-checker → publisher. Each agent is optimised for its single role with a focused tool set, making the system more reliable and easier to debug than a single generalist agent performing all roles.

Best Use Cases

Content production pipelines with defined stage gates

Data transformation workflows where each stage enriches or transforms the previous output

Any workflow where the sequence of stages is fixed and stages have clear dependencies

Trade-off A failure in any stage blocks the entire pipeline. Error handling at each stage transition is critical — without retry logic or fallback paths, one bad agent output stalls the whole workflow. Combine with the Self-Healing pattern at each stage junction for production resilience.

Evaluator-Optimizer

Generate → score → refine → loop

Generator → Evaluator → Pass? ↺ No Generator

The Evaluator-Optimizer separates the agent that generates output (Generator) from the agent that scores it against explicit criteria (Evaluator). The Evaluator uses rubrics, reference outputs, or LLM-as-judge scoring to assess quality and provide structured feedback. The Optimizer (or Generator in its refinement mode) incorporates this feedback in the next generation cycle. The loop runs until the Evaluator scores the output as passing or a maximum iteration count is reached. This is the agentic equivalent of test-driven development — the acceptance criteria are defined before generation, and generation runs until acceptance is achieved. Anthropic’s content engine uses this pattern to catch 73% of quality issues that would otherwise require human intervention.

Best Use Cases

Literary translation where an evaluator can assess nuance that the translator may miss

Complex search tasks requiring multiple retrieval-analysis rounds with a quality gate

Any task where output quality can be scored objectively against defined criteria

Trade-off 85% of quality improvement occurs in the first 2 iterations (Anthropic research). Cap at 3–5 iterations maximum and escalate to human review if the threshold is not met. Without a cap, generators in infinite loops produce lateral changes rather than genuine improvements.

Critic-Actor

Structured feedback · actor refines

Actor → Output → Critic → Feedback → Actor

The Critic-Actor pattern assigns two distinct roles: the Actor generates the primary output (code, content, plan, decision); the Critic provides structured, targeted feedback on specific weaknesses. The Actor then refines its output based on the Critic’s guidance. Unlike the Evaluator-Optimizer, the Critic’s role is not simply to pass/fail but to provide actionable, structured feedback that guides the Actor’s revision. This mirrors a human author-editor relationship: the editor does not just say “this is bad” but identifies specifically why and how it should be improved. The Critic-Actor loop runs until the Actor’s output clears the Critic’s bar or a maximum iteration threshold is reached.

Best Use Cases

Code review cycles where the critic identifies specific bugs, style violations, or security issues

Strategic plan refinement where the critic challenges assumptions and identifies gaps

Any iterative refinement where the quality of feedback determines the quality of refinement

Trade-off Critic quality is the binding constraint. A critic that produces vague or overly general feedback enables only superficial actor improvements. Invest in the critic’s evaluation rubric — it is the intellectual core of this pattern, not the actor’s generation capability.

Self-Healing / Retry Loop

Diagnose failure · correct strategy · retry

Attempt → Failure? → Diagnose → Corrected Retry

The Self-Healing pattern equips an agent with the ability to diagnose its own failures and retry with a corrected strategy rather than simply re-executing the same failed action. When a tool call fails, an API returns an error, or the agent reaches a dead end, the self-healing loop kicks in: the agent analyses the error, identifies what went wrong and why, and formulates a different approach for the retry. This is qualitatively different from naive exponential backoff — it is intelligent failure recovery. The SRE automation case in Azure’s architecture guide is instructive: when a service outage occurs, the system creates and implements a remediation plan dynamically, without knowing the specific steps upfront, diagnosing and adapting until the live-site issue is resolved.

Best Use Cases

Any production agent that calls external tools or APIs that can fail in multiple ways

Infrastructure automation where recovery paths vary based on the specific failure type

Long-running workflows that cannot be restarted from scratch on every failure

Trade-off Without a maximum retry count and a meaningful escalation path, self-healing loops can run indefinitely, consuming tokens and budget. Define: maximum attempts, a circuit-breaker condition, and a clear escalation path (to HITL or to a fallback chain) when the maximum is reached.

HITL — Human-in-the-Loop

Human checkpoints · approve · correct · redirect

Agent Action → HITL Gate → Approve / Correct → Continue

HITL is not a fallback — it is a deliberate architectural choice. A human steps in at predefined checkpoints to approve, correct, or redirect agent behaviour before execution continues. These gates are typically placed at: high-stakes irreversible actions (delete, publish, transact), quality thresholds below which automated evaluation cannot be trusted, and escalation points when automated loops exceed iteration limits. Microsoft’s Azure framework explicitly distinguishes between mandatory HITL gates (which make the orchestration synchronous at that step and must checkpoint state for resumption) and optional gates (where human input can improve quality but is not required). The Databricks guidance frames this as “OS-level permissions — a sudo prompt: high-stakes agent actions automatically route to humans for approval.”

Best Use Cases

Any irreversible action with material business consequences (financial transactions, bulk deletions)

Regulated workflows where human accountability is legally required

Evaluator-Optimizer loops that have not passed quality thresholds after maximum iterations

Trade-off Mandatory HITL gates make the workflow synchronous at that step — latency depends on human response time. Persist agent state at every HITL checkpoint so the workflow can resume without replaying prior work when the human responds. Never require human re-entry of context the agent already has.

Quick Reference

Pattern Decision Matrix

Match your design constraint to the pattern. Real systems typically combine 2–3 from different tiers.

If you need…	Use Pattern	Tier	Primary Benefit	Watch Out For
Agent that thinks before every tool call	ReAct	T1	Traceability + adaptive reasoning	Per-call latency overhead
Long task with predictable decomposition	Plan-and-Execute	T1	3.6× speedup, cost-split planning/exec	Plan brittleness on step failure
Improve output quality through iteration	Reflection	T1	+11% coding benchmark improvement	Echo-chamber self-critique
Ground outputs in real-world data or actions	Tool Use	T1	Factual grounding + real-world action	Tool proliferation → hallucination
Complex task with unpredictable subtasks	Orchestrator-Subagent	T2	Flexible delegation, specialised workers	Implicit routing → 31% failure drop
Quality control with specialist routing	Supervisor	T2	Quality gate before user delivery	Extra inference per cycle
Independent tasks, latency is the constraint	Fan-Out / Fan-In	T2	Latency = slowest agent, not sum	Cost multiplies with agent count
Large corpus, scale beyond single context	MapReduce	T2	Horizontal scale over large datasets	Reducer needs homogeneous inputs
High-stakes decisions needing stress-testing	Debate / Adversarial	T2	Surfaces blind spots, breaks echo-chamber	High token and time cost
Multi-domain system needing nested delegation	Hierarchical Agents	T3	Scalable complexity management	Each level adds latency
Well-defined workflow with clear stage sequence	Sequential Pipeline	T3	Highest predictability and auditability	One stage failure blocks whole pipeline
Output quality must meet a defined threshold	Evaluator-Optimizer	T3	73% of quality issues caught automatically	Cap at 3 iterations; escalate beyond
Output needs targeted, structured feedback	Critic-Actor	T3	Specific feedback guides refinement	Critic rubric quality is the bottleneck
Production agents that must recover from failure	Self-Healing / Retry	T3	Intelligent error recovery, not dumb retry	Circuit breaker + escalation required
Irreversible action or regulatory accountability	HITL	T3	Human accountability at critical gates	Checkpoint state — never lose context

Engineering Principle

Start Simple. Add Patterns When Failures Demand Them.

The most reliable guidance from every production deployment of AI agents in 2025 and 2026 is the same: start with the simplest pattern that addresses the core problem, then layer additional patterns only when a specific failure mode demands it. Tool Use plus ReAct handles a remarkable proportion of real-world agent tasks. The Evaluator-Optimizer adds quality assurance when output consistency matters. HITL adds human accountability when irreversibility or regulation demands it. Each pattern adds coordination complexity and coordination failure risk alongside whatever problem it solves.

The engineers who over-architect agents — reaching for Hierarchical Orchestrators and Debate/Adversarial loops before they have validated that a single-agent ReAct loop fails — are spending engineering budget and operational complexity on problems they have not confirmed exist. The engineers who under-architect — deploying a plain chatbot where a Supervisor with quality gates was required — are handing users inconsistent outputs without recourse.

The 15 patterns here are a vocabulary, not a checklist. You do not need all 15. You need the 2–3 that match your actual coordination and quality problems. The Anthropic principle that guides all of this is worth internalising as a default: maintain simplicity, prioritise transparency by showing planning steps, and build only what your actual failure modes demand.

Mastering a handful of composable design patterns matters far more than mastering any single framework. Frameworks change. Patterns persist. The pattern is the architecture — the framework is just the scaffolding you hang it on.

Sources: Anthropic — Building Effective Agents (Schluntz & Zhang) · Anthropic — Enterprise Agent Deployment Patterns 2025 · Microsoft Azure Architecture Center — AI Agent Orchestration Patterns · Databricks — Agent System Design Patterns · SitePoint — Agentic Design Patterns: The 2026 Guide · The Thinking Company — AI Agent Orchestration Patterns 2026 · n1n.ai — 5 AI Agent Design Patterns to Master by 2026 · Microsoft Azure Blog — Agent Factory: Common Use Cases and Design Patterns · Spring.io — Building Effective Agents with Spring AI · GuruSup — Best Multi-Agent Frameworks 2026 · Langfuse — Framework Comparison 2026 · Gartner — Agentic AI as Top Strategic Technology Trend 2025

15 AI AgentPatterns

Frameworks Change. Patterns Persist.

Pattern Decision Matrix

Start Simple. Add Patterns When Failures Demand Them.

15 AI Agent
Patterns