Mins Read

MCP Gateway Architecture: Components, Request Flow, and Reference Patterns (2026)

MCP gateway architecture explained: control plane vs data plane, the 10 core components, the 6-stage request flow, transports, session state, deployment topology, and design choices for production AI deployments in 2026.

Dhayalan Subramanian

Associate Director - Product Growth at DigitalAPI

In this blog

heading h2

Share blog

TL;DR

1. Two-plane architecture: Control plane holds registry, policy, credentials, audit. Data plane runs every live MCP request.

2. Ten core components: Auth handler, identity resolver, policy engine, discovery filter, session router, backend router, credential broker, audit emitter.

3. Six-stage request flow: Authenticate, resolve identity, filter discovery, evaluate policy, route with credentials, stream response and emit audit.

4. Four transports supported: stdio, HTTP, SSE, and Streamable HTTP. The gateway bridges between them at the boundary.

5. Stateful sessions: Worker affinity, externalized metadata in PostgreSQL or Valkey, graceful drain, immediate revocation propagation.

6. Vendor differentiation: Rust vs Go, Cedar vs OPA, Helm-first vs operator packaging separate commercial gateways from open-source implementations.

Get a 20-min walkthrough of DigitalAPI's MCP Gateway. Book a Demo!

What does an MCP gateway architecture look like?

An MCP gateway architecture has two planes: a control plane that holds configuration, registry, policy, and audit metadata, and a data plane that handles every live MCP request. AI clients connect once to the data plane endpoint. The gateway authenticates, applies policy, routes to a registered MCP server or an OpenAPI-derived MCP tool, and streams the response back. The control plane never sits in the hot path.

This separation matters because the two planes have different reliability budgets. The data plane must handle every tool call with predictable latency and high availability. The control plane can tolerate brief outages, because policies cache at the data plane for a bounded period during control-plane disruption.

The reference architecture has three layers:

Top layer (control plane): Admin console, MCP server registry, API source registry, catalog, agent registry, policy management, credential binding configuration, audit search and export
Middle layer (data plane): Auth handler, identity resolver, policy engine, discovery filter, session router, MCP backend router, API-to-MCP adapter, credential broker, private connector handler, audit emitter, OpenTelemetry exporter
Bottom layer (backends): Private MCP servers (legal, devops, knowledge), internal REST APIs exposed through the API-to-MCP adapter, and any third-party MCP servers your policy allows

What are the components of an MCP gateway?

An MCP gateway has ten core components that together turn the MCP protocol into operational infrastructure. Six belong to the data plane and run on every request. Four belong to the control plane and run when administrators configure the system.

1. Authentication handler

Validates client identity at the edge of the data plane. Supports OAuth 2.1 with PKCE, JWT, mTLS, and short-lived API keys. The handler checks issuer, audience, expiry, signature, and revocation status before any other component sees the request. Invalid tokens fail at this gate and never reach policy or routing.

2. Agent identity resolver

Builds the actor chain from the validated token. The chain includes tenant, environment, client surface (hosted agent, IDE, SDK), human user or delegator (when present), agent identity, and optional agent instance ID. Every downstream component evaluates against this resolved actor chain, not against the raw token.

3. Policy engine

Evaluates whether the requested action is allowed for the resolved actor. Production gateways use Cedar (principal, action, resource, context) or Open Policy Agent (OPA). Cedar is the cleaner fit for MCP because its model maps directly to users, agents, sessions, tools, environments, credential modes, and delegation context. Policies are versioned, simulatable, and revocable in-place.

4. Discovery filter

Runs at tools/list time. The filter consults the policy engine for each registered tool and returns only the subset the resolved actor is allowed to call. Unauthorized tools are hidden, not just denied at execution. This prevents agents from reasoning about tools they cannot use and removes reconnaissance surface.

5. Stateful session router

Maintains client-session-to-worker affinity. When a client opens a long-lived session, the router records a session ID, externalizes metadata to PostgreSQL or Valkey, and pins subsequent requests in that session to the owning worker or shard. Affinity supports SSE streaming, multi-turn tool calls, and basic reconnect.

6. MCP backend router

Selects the correct registered MCP server for an approved tools/call. Routing keys include tool name, agent identity, tenant, and environment. The router enforces environment isolation (dev, staging, prod) and refuses requests to unregistered backends. SSRF-style attacks fail here.

7. API-to-MCP adapter

Converts approved OpenAPI 3.x operations into MCP tools at runtime. The adapter parses the spec, generates an MCP tool definition, validates inputs against the JSON Schema, maps the call to the right HTTP method and path, enforces upstream host allowlists, and applies per-operation timeouts and size limits. This component is what lets enterprises expose internal REST APIs to agents without writing custom MCP servers.

8. Credential broker

Resolves credentials for upstream backend calls. Supports service-account credentials, user-delegated OAuth tokens (on-behalf-of), agent-scoped credentials, and workload-identity-mapped credentials. The broker integrates with HashiCorp Vault and cloud secret managers. Secrets never appear in logs, traces, metrics, or audit events.

9. Private connector handler

Routes traffic to private MCP servers or internal APIs without requiring public endpoints. Outbound-only connector mode is the more common production pattern, because customers do not need to open inbound ports. Connectors authenticate to the gateway with mTLS and enforce per-backend network allowlists.

10. Audit emitter + OpenTelemetry exporter

Emits a structured audit event for every tool call (allow, deny, error), every policy decision (rule reference, policy version), every session lifecycle event (create, reconnect, drain, terminate, revoke), and every admin change. The same span graph feeds OpenTelemetry traces and metrics, so operators see latency, error rates, and request volumes per agent, per tool, per tenant.

How does a request flow through an MCP gateway?

A single MCP tool call passes through six stages in the gateway: client initialization, identity resolution, discovery filtering, policy evaluation, backend invocation with credential injection, and SSE response streaming with audit emission. Each stage has a discrete output that the next stage consumes.

Stage 1: Client initialization and authentication

The MCP client opens a connection to the gateway's MCP endpoint. The auth handler validates the bearer token. If the token is invalid, the gateway returns a machine-readable error and closes the connection.

Stage 2: Agent identity resolution

The identity resolver builds the actor chain: tenant, environment, client surface, optional human delegator, agent identity, optional agent instance ID. The resolved chain travels with every downstream call.

Stage 3: Discovery and tools/list with policy filtering

The client calls tools/list. The gateway queries every registered MCP server, deduplicates tool names, and returns one composite list. Before returning, the discovery filter consults the policy engine for each candidate tool and removes any the actor is not allowed to call. The client sees only authorized tools.

Stage 4: Tool invocation and tools/call with policy evaluation

The agent calls tools/call with a tool name and arguments. The gateway validates the arguments against the JSON Schema in the registry, then asks the policy engine to evaluate the call against the actor chain. Denied calls return a machine-readable reason with the policy version that produced the decision.

Stage 5: Backend routing and credential injection

For approved calls, the backend router selects the destination (a registered MCP server or an API-backed tool through the adapter). The credential broker resolves the right credential mode (service account, user-delegated OAuth, agent-scoped, workload-identity-mapped) and injects it into the upstream call. The private connector handler delivers the request to the backend.

Stage 6: Response streaming and audit emission

Many MCP responses stream over SSE, especially for long-running tool calls. The gateway maintains session affinity throughout the stream, applies backpressure on slow clients, and handles keepalive and reconnection. When the call completes, the audit emitter writes a structured event with the full actor chain, policy decision, policy version, credential mode, connector ID, upstream status, latency, and any error class. The event goes to the customer SIEM.

A successful call passes all six stages in 5 to 30 milliseconds of gateway overhead. The bulk of end-to-end latency is the LLM inference and the backend tool execution, not the gateway.

How does the gateway handle transport and protocol mechanics?

MCP messages are JSON-RPC 2.0 over one of four transports: stdio, HTTP, SSE, and Streamable HTTP. A production gateway bridges between transports at the gateway boundary, so an IDE agent using stdio can still reach a remote HTTP-SSE server.

JSON-RPC 2.0 message structure

Every MCP message is a JSON object with four fields: jsonrpc (always 2.0), id (a correlation identifier), method (the operation name like tools/list or tools/call), and params (the operation arguments). Responses include the id from the original request, plus either a result field or an error field. Notifications omit the id and require no response.

Four transport options

‍stdio is the simplest. The client launches the server as a subprocess and exchanges JSON-RPC messages over standard input and output. Useful for local development and CLI workflows. Hard to govern in production because each session is a private process.‍
HTTP sends one JSON-RPC request per HTTP POST and expects one HTTP response. Stateless. Easy to load-balance. The wrong choice for multi-turn or streaming tools.‍
SSE (Server-Sent Events) keeps a long-lived connection open and lets the server push multiple JSON-RPC messages back. Required for any tool that streams progress, partial results, or backend events. The gateway has to manage keepalive, reconnection, and backpressure on the client's behalf.‍
Streamable HTTP combines HTTP and SSE into one transport. Client sends a JSON-RPC request as an HTTP POST. Server can respond with either a single JSON object or a stream of SSE events, chosen at runtime. This is the transport the Anthropic spec is moving toward in 2026.

When to use which transport

Transport	Use When	Avoid When
stdio	Local IDE plugins, CLI agents, development	Multi-tenant production, remote servers
HTTP	Stateless single-shot tools, simple load-balanced fleets	Streaming responses, multi-turn sessions
SSE	Long-running tools, streaming output, backend event notification	Stateless single-shot calls
Streamable HTTP	New deployments in 2026 onward, mixed workloads	Legacy clients that only support plain HTTP

‍

Transport bridging at the gateway

A common production pattern: an IDE agent connects to the gateway over stdio (through a thin local connector), the gateway terminates that connection, then opens an HTTP-SSE connection to a remote MCP server. The gateway becomes the protocol translator. Without that bridge, the IDE agent cannot reach remote tools without bespoke wiring.

How does session state work in an MCP gateway?

MCP is a session-oriented protocol. Long-running tool calls, multi-turn conversations, and streaming responses all rely on the gateway preserving session state across requests. Production gateways pin a session to one worker, externalize the metadata to a shared store, and propagate revocations immediately.

Why sessions matter

A REST API gateway can route each request to any worker, because each request is independent. An MCP gateway cannot. A tools/call invocation might run for several seconds. While it runs, the gateway streams partial results back. If a load balancer routed mid-stream traffic to a different worker, the stream would break.

Sticky session affinity

The gateway assigns a session ID at connection time and pins all traffic in that session to one worker or shard. Affinity can use a consistent hash on the session ID, an explicit affinity header, or a sticky cookie at the load balancer. The choice depends on the deployment topology, but the requirement is the same: traffic for a session lands on the worker that owns it.

Externalized session metadata

Worker affinity alone is not enough. If the worker crashes, the session disappears. Production gateways write session metadata to a shared store (PostgreSQL for durability, Valkey or Redis for speed) so a different worker can pick up the session if the original worker dies. The metadata includes the client session ID, backend session IDs, policy snapshot at session creation, credential binding reference, and active stream state.

Graceful drain

When a worker upgrades or restarts, it stops accepting new sessions and lets existing sessions finish within a policy-defined limit (typically 60 to 300 seconds). Clients with long-running tools complete their work. New traffic routes to other workers. This avoids the truncated streams that plague naive deployments.

Basic reconnect

If a client loses its TCP connection mid-session, it can reconnect with the original session ID. The session router looks up the externalized metadata, finds the owning worker, and routes the reconnect there. Whether the gateway can resume an in-flight stream depends on the backend's resumability guarantees, but session context (auth, identity, policy snapshot) is always restored.

Immediate revocation propagation

Sessions complicate revocation. If an administrator disables an agent, the gateway must terminate every active session that agent owns, including in-flight tool calls. Production gateways propagate revocations through the session store and the data plane within seconds. Anything slower creates a window where revoked agents continue to act.

What deployment topologies does the architecture support?

Two production deployment modes dominate in 2026: hybrid, where a managed control plane drives a customer-side data plane, and fully self-hosted, where the entire product runs inside customer infrastructure. Many enterprise buyers require fully self-hosted because they will not route private tool traffic through a vendor SaaS.

Hybrid deployment

The managed control plane provides admin console, registry metadata, policy management, agent registry metadata, and configuration distribution. The customer-side data plane handles MCP request routing, policy enforcement at runtime, credential resolution, private connector handling, session routing, and audit event generation.

Hybrid trust boundary:

Private payloads do not need to leave the customer environment: the data plane handles every tool call locally
Customers can disable hosted payload visibility: the control plane sees only redacted metadata, if anything
Runtime audit goes directly to the customer SIEM: no audit hop through the vendor
Cached policy keeps enforcing during control-plane outages: the data plane operates on its last good policy snapshot for a configurable window

Fully self-hosted deployment

The entire product, control plane and data plane, runs inside customer-controlled Kubernetes. No required outbound vendor dependency for normal operation. Customer-managed PostgreSQL and Valkey for state. Customer-managed secret manager for credentials. Customer-managed identity provider for authentication. Customer-owned audit export.

Self-hosted suits these requirements:

Compliance posture forbids vendor SaaS routing: common in regulated industries
Data residency requires keeping all metadata in-country: addressed by the customer choosing the Kubernetes cluster
Security teams require attestations on every running component: signed images and SBOMs make this auditable

Single-region HA versus multi-region

Most V1 deployments run a single-region high-availability cluster with multiple data plane replicas behind a load balancer. Workers share session metadata through the external store, so any worker can pick up reconnects. Multi-region active-active is harder because cross-region session state is expensive to keep coherent. Most gateways defer multi-region active-active to V2.

Topology comparison

Topology	Control Plane	Data Plane	Trust Boundary	Operations Burden
Vendor SaaS	Vendor	Vendor	Vendor sees everything	Lowest
Hybrid	Vendor	Customer	Private payloads stay customer-side	Medium
Fully self-hosted	Customer	Customer	Nothing leaves the customer environment	Highest
Hybrid plus self-hosted fallback	Both available	Customer	Customer-side regardless	Medium

‍

What design choices distinguish good MCP gateway implementations?

Five design choices separate strong MCP gateway implementations from weak ones: data plane language, policy engine, data store, packaging, and observability stack. Most enterprise buyers care about the first three; platform engineers care about all five.

Language: Rust versus Go for the data plane

The data plane handles stateful streaming, revocation, backpressure, and session routing under load. Rust offers memory safety, explicit lifetimes, and compiler-enforced concurrency correctness, which catches a class of bugs that bite stateful gateways. Go is the pragmatic alternative with a richer ecosystem for Kubernetes controllers and platform tooling.

The split that has worked in production: Rust for the data plane and the policy hot path, Go for operators and platform glue.

Policy engine: Cedar versus OPA

Cedar's principal, action, resource, context model maps almost one-to-one onto MCP's users, agents, sessions, tools, environments, credential modes, and delegation context. Policies are easy to read, deterministic, and analyzable. OPA is more flexible but harder to reason about, and its Rego language has a steeper learning curve.

For an MCP gateway, Cedar wins on clarity and auditability. OPA wins on flexibility for organizations that already standardize on it elsewhere.

Data store: PostgreSQL plus Valkey versus ClickHouse plus NATS

PostgreSQL holds durable control-plane truth (registry, policy versions, agent registry, audit). Valkey or a Redis-compatible cache holds hot session state and route cache. This pair is simple to operate, easy to back up, and present in most enterprise environments.

ClickHouse and NATS shine for analytics-heavy deployments with billions of audit events and high-throughput event distribution. They are powerful and operationally heavier. Most production V1 deployments do not need them. Default to PostgreSQL plus Valkey and add ClickHouse or NATS later if scale demands it.

Packaging: Helm-first versus operator-first

Helm-first packaging is the fastest way to ship a self-hostable product. A customer runs one Helm command, gets the gateway, and upgrades through Helm chart versions. The trade-off: Helm cannot model complex declarative reconciliation across many CRDs.

A Kubernetes operator gives declarative drift correction, CRD-based configuration, and a richer platform-engineering story. The trade-off: operators take longer to build and stabilize.

Sensible path: Helm-first in V1, operator as a V1.5 or V2 addition once the install surface stabilizes.

Observability: OpenTelemetry as the default

OpenTelemetry covers traces, metrics, and logs in one vendor-neutral instrumentation library. Customers point the exporter at their existing observability platform (Datadog, Grafana, Honeycomb, Dynatrace) without code changes in the gateway.

Gateways that hardcode a specific observability vendor create operational lock-in. Gateways that emit OpenTelemetry by default avoid it.

Decision matrix

Concern	Default V1 Choice	Why
Data plane language	Rust	Memory safety, concurrency correctness, low-latency hot path
Control plane language	Rust	Shared types with the data plane, explicit state machines
Platform tooling and operators	Go	Best ecosystem fit for Kubernetes controllers
Policy	Cedar	Maps to MCP actor model, deterministic, analyzable
Identity	OIDC plus SAML	Enterprise SSO coverage
Workload identity	SPIFFE / SPIRE-ready	Connector and mTLS trust
Data	PostgreSQL plus Valkey	Operationally simple, ubiquitous
Analytics store	Customer SIEM in V1	Avoid forcing a heavy stack on day one
Eventing	PostgreSQL outbox in V1	Scale to NATS later if needed
Packaging	Helm plus Cosign plus SBOM	Enterprise install + signed artifacts
Observability	OpenTelemetry	Vendor-neutral export

‍

For a deeper look at how these choices feed governance and audit posture, see MCP governance and MCP gateway security (publishing soon).

How does MCP gateway architecture compare to API gateway and AI gateway architecture?

API gateway architecture is stateless and REST-routing-centric. AI gateway architecture sits at the LLM layer and manages tokens and providers. MCP gateway architecture is stateful, JSON-RPC-parsing, and tool-level-policy-aware. The three sit at different layers of a production AI stack.

1. API gateway architecture (the reference for two decades)

The classic API gateway is a stateless reverse proxy in front of REST microservices. Auth runs at the edge. Rate limits run per IP or per API key. Routing is path-based or host-based. Sessions, when needed, are typically punted to the backend. The architecture optimizes for throughput and edge security, not session affinity or protocol parsing.

2. AI gateway architecture

An AI gateway sits between agents and LLM providers (OpenAI, Anthropic, Google, Bedrock). Routing optimizes for provider fallback, cost control, and prompt logging. Session state, when present, is short-lived. Token budgets and content moderation are first-class. The architecture is closer to an API gateway than to an MCP gateway, with LLM-aware extensions.

3. MCP gateway architecture

The MCP gateway parses JSON-RPC at the gateway, maintains session affinity across multiple requests, streams SSE responses, aggregates tools/list across servers, enforces tool-level RBAC, and emits per-tool audit. It is a session-oriented infrastructure component, not a stateless proxy.

Side-by-side architectural comparison

Architectural Property	API Gateway	AI Gateway	MCP Gateway
Statefulness	Stateless	Mostly stateless	Stateful per session
Protocol parsing	HTTP / TLS	HTTP plus prompt aware	JSON-RPC + SSE + Streamable HTTP
Session affinity	Optional	Rare	Required
Auth	OAuth, JWT, mTLS, API keys	OAuth, JWT, provider tokens	OAuth 2.1, JWT, mTLS, agent identity
Policy granularity	Per route	Per provider	Per tool, per parameter
Audit detail	Request log	Prompt log	Full actor chain plus policy decision
Discovery	None	Provider catalog	`tools/list` aggregation + policy filter
Streaming	Optional	Important	Required
Typical language	Go, C, OpenResty	Python, Node	Rust, Go
Data store	Cache plus log	Cache plus log	PostgreSQL + Valkey + session store

‍

The three gateway types stack. A mature AI infrastructure runs an API gateway at the edge for REST traffic, an AI gateway in front of model providers for LLM traffic, and an MCP gateway in front of tool servers for agent traffic. Cloud providers are starting to merge these (AWS API Gateway added MCP proxy in December 2025, Azure API Management exposes REST APIs as MCP servers), but the underlying architectures remain distinct.

For a focused side-by-side on the comparison angle, see MCP vs API gateway (publishing soon).

Introducing DigitalAPI's Enterprise MCP Gateway architecture

DigitalAPI's MCP Gateway is built on a Rust-first data plane, Cedar policy, PostgreSQL plus Valkey for state, Helm-first packaging, and OpenTelemetry observability, deployable in hybrid or fully self-hosted mode. The architecture follows the reference patterns above, with deliberate choices that prioritize memory safety, deterministic policy, and customer-controlled deployment.

Control plane components

Admin console: Onboarding, policy editing and simulation, audit search, active session inspection, emergency disable across servers, tools, agents, credentials, connectors, and client surfaces
MCP server registry: Approved server metadata with owner, environment, endpoint, connector, version, tool definitions, risk labels, credential mode, and approval status
API source registry: Approved OpenAPI sources with parsed operations, allowed mappings, auth bindings, risk labels, and drift detection
Agent registry: Non-human agent identities with owner, purpose, allowed surfaces, delegation mode, and revocation status
Policy management: Cedar policy editor, versioning, simulation, and publishing
Credential binding configuration: Mapping of agents and tools to credential modes (service account, user-delegated OAuth, agent-scoped, workload-identity)
Catalog-lite: Searchable developer view over approved capabilities with configuration snippets and request-access flows
Audit search and export: SIEM-grade export, redacted-by-default payload logging, full actor chain on every event

Data plane components

Authentication handler: OAuth 2.1 with PKCE, OIDC, SAML for admin SSO, mTLS for workload identity
Agent identity resolver: Builds tenant, environment, client surface, optional human delegator, agent identity, optional agent instance ID
Cedar policy engine: Evaluates principal, action, resource, context with environment-aware rules, explicit deny, and policy versioning
Discovery filter: tools/list policy filtering so unauthorized tools never appear to the agent
Stateful session router: Worker affinity, externalized session metadata in PostgreSQL plus Valkey, graceful drain, reconnect, immediate revocation propagation
MCP backend router: Routes to registered MCP servers with environment-aware isolation and SSRF protection
API-to-MCP adapter: Runtime adapter that turns approved OpenAPI 3.x operations into governed MCP tools with schema validation, host allowlists, timeouts, and size limits
Credential broker: Vault-compatible credential resolution with no secret leakage to logs, traces, audit, or metrics
Private connector handler: Direct private endpoint routing or outbound-only connector mode with mTLS and per-backend allowlists
Audit emitter and OpenTelemetry exporter: Structured events to customer SIEM, OTel traces and metrics to customer observability platform

Architecture decisions

Rust-first data and control plane core: Memory safety and concurrency correctness for stateful hot paths, with shared policy and session types across both planes
Cedar for policy: POrincipal, action, resource, context maps cleanly to the MCP actor model with deterministic, simulatable rules
PostgreSQL plus Valkey as the V1 stack: No required ClickHouse, NATS, or Kafka dependency in the default install
Helm-first packaging with Cosign signing and SBOMs: Customer can verify deployable artifacts before installation
OpenTelemetry by default: Vendor-neutral export to whatever observability platform the customer runs

See DigitalAPI's reference architecture running inside your infrastructure (hybrid or fully self-hosted) → Book a demo

FAQs

1. What does an MCP gateway architecture look like?

An MCP gateway architecture has two planes. The control plane holds configuration, registry, policy, credential bindings, and audit. The data plane runs the auth handler, identity resolver, policy engine, discovery filter, session router, backend router, API-to-MCP adapter, credential broker, connector handler, and audit emitter. AI clients connect once to the data plane.

2. What components does an MCP gateway have?

Ten core components: authentication handler, agent identity resolver, policy engine, discovery filter, stateful session router, MCP backend router, API-to-MCP adapter, credential broker, private connector handler, and audit emitter with OpenTelemetry exporter.

3. How does a request flow through an MCP gateway?

Six stages: client initialization and authentication, agent identity resolution, discovery filtering on tools/list, policy evaluation on tools/call, backend routing with credential injection, and SSE response streaming with audit emission. A well-engineered gateway adds 5 to 30 ms of overhead at p99.

4. What is the difference between the control plane and the data plane in an MCP gateway?

The control plane holds configuration, registry, policy, and audit metadata. It runs when administrators configure the system. The data plane handles every live MCP request. It runs on the hot path with predictable latency.

5. How does session affinity work in an MCP gateway?

The gateway assigns a session ID at connection time and pins all session traffic to one worker using consistent hashing, an affinity header, or a sticky load-balancer cookie. Session metadata externalizes to PostgreSQL or Valkey so a different worker can take over if the owning worker fails.

6. What transports do MCP gateways support?

Four: stdio for local subprocess workflows, plain HTTP for stateless single-shot calls, SSE for streaming responses and multi-turn sessions, and Streamable HTTP, which combines HTTP and SSE into one transport and is the direction the spec is moving in 2026. Production gateways bridge between transports at the gateway boundary.

7. Why is JSON-RPC used in MCP instead of REST?

JSON-RPC's request and response correlation through the id field works cleanly with bidirectional streaming over SSE. REST's resource-oriented model does not. JSON-RPC also supports notifications (one-way messages with no response), which MCP uses for progress updates and asynchronous events.

8. Can an MCP gateway run fully self-hosted?

Yes. Production gateways like Microsoft MCP Gateway (open source), Docker MCP Gateway, Tyk, Kong, Gravitee, and DigitalAPI run inside customer Kubernetes with no required outbound dependency. PostgreSQL plus Valkey is the typical V1 default stack.

9. How does an MCP gateway handle policy revocation mid-session?

The session router propagates revocations through the externalized session store within seconds. Affected sessions terminate immediately, including in-flight tool calls. Anything slower creates a window where revoked agents continue to act.

10. What language are MCP gateways usually written in?

Production MCP gateways favor Rust for the data plane and policy hot path because memory safety and explicit concurrency catch a class of bugs that bite stateful streaming services. Go is the pragmatic alternative for Kubernetes operators and platform tooling. Choice depends on the team's existing operational expertise.

About the author

Dhayalan Subramanian

Dhayalan Subramanian is Associate Director, Product Growth at DigitalAPI, where he leads go-to-market and product growth for the company’s multi-gateway API management platform. His work focuses on helping large enterprises and mid-market cloud companies consolidate APIs across AWS, Azure, Apigee, Kong, MuleSoft, and other gateways into a single control plane for governance, discovery, monetization, and agent consumption.

‍

Dhayalan brings 14+ years of experience across product strategy, enterprise architecture, and engineering leadership. Earlier in his career, he held senior roles at Encora (as Associate Architect and Technical Manager), Mindtree (Technology Lead), Tech Mahindra (Technical Lead), and Primus Analytics, where he designed integration frameworks and delivered enterprise-grade digital platforms for global customers.

‍

At DigitalAPI, he works directly with platform, integration, and developer experience leaders at Fortune 500 organizations to operationalize unified API catalogs, developer portals, and MCP-ready APIs. He writes regularly on API developer experience, API governance, and AI agent architectures.

One email a fortnight. Worth opening.

A short digest of what we're writing, what we're learning from customers, and the handful of links you'd actually want from us. No tracking pixels.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Thank you for subscribing!

Oops! Something went wrong while submitting the form.

Cookies Policy Privacy Policy Disclosure 2022 - 23 Disclosure 2023 - 24 Disclosure 2024 - 25

MCP Gateway Architecture: Components, Request Flow, and Reference Patterns (2026)

What does an MCP gateway architecture look like?

What are the components of an MCP gateway?

1. Authentication handler

2. Agent identity resolver

3. Policy engine

4. Discovery filter

5. Stateful session router

6. MCP backend router

7. API-to-MCP adapter

8. Credential broker

9. Private connector handler

10. Audit emitter + OpenTelemetry exporter

How does a request flow through an MCP gateway?

Stage 1: Client initialization and authentication

Stage 2: Agent identity resolution

Stage 3: Discovery and tools/list with policy filtering

Stage 4: Tool invocation and tools/call with policy evaluation

Stage 5: Backend routing and credential injection

Stage 6: Response streaming and audit emission

How does the gateway handle transport and protocol mechanics?

JSON-RPC 2.0 message structure

Four transport options

When to use which transport

Transport bridging at the gateway

How does session state work in an MCP gateway?

Why sessions matter

Sticky session affinity

Externalized session metadata

Graceful drain

Basic reconnect

Immediate revocation propagation

What deployment topologies does the architecture support?

Hybrid deployment

Fully self-hosted deployment

Single-region HA versus multi-region

Topology comparison

What design choices distinguish good MCP gateway implementations?

Language: Rust versus Go for the data plane

Policy engine: Cedar versus OPA

Data store: PostgreSQL plus Valkey versus ClickHouse plus NATS

Packaging: Helm-first versus operator-first

Observability: OpenTelemetry as the default

Decision matrix

How does MCP gateway architecture compare to API gateway and AI gateway architecture?

1. API gateway architecture (the reference for two decades)

2. AI gateway architecture

3. MCP gateway architecture

Side-by-side architectural comparison

Introducing DigitalAPI's Enterprise MCP Gateway architecture

Control plane components

Data plane components

Architecture decisions

FAQs

1. What does an MCP gateway architecture look like?

2. What components does an MCP gateway have?

3. How does a request flow through an MCP gateway?

4. What is the difference between the control plane and the data plane in an MCP gateway?

5. How does session affinity work in an MCP gateway?

6. What transports do MCP gateways support?

7. Why is JSON-RPC used in MCP instead of REST?

8. Can an MCP gateway run fully self-hosted?

9. How does an MCP gateway handle policy revocation mid-session?

10. What language are MCP gateways usually written in?

More on this topic.

One email a fortnight. Worth opening.