Skip to main content
jeff.leung
Case study 03 · Updated 2026-05-27

MCP Router

A reliable tool-routing and plugin/runtime platform for the Model Context Protocol — typed surfaces, lifecycle hygiene, plugin isolation, and observable failures.

Backend / Platform Infrastructure AI-assisted engineering Security
Problem

Coding agents talk to tools through MCP servers. As the number of MCP servers grows, the surface area grows with them — auth, lifecycle, version drift, partial failures, and noisy logs. Without a router, every host has to solve these problems on its own.

Constraints
  • Must remain protocol-faithful so any conforming MCP client works without bespoke adapters.
  • Must isolate plugin/runtime failures so one broken server does not poison the rest.
  • Must surface failures as structured events, not as silent timeouts.
  • Must support local-first operation; cloud is optional, not required.
Architecture

The MCP router exists because tool surfaces multiply faster than humans can keep track. The router turns the messy, growing set of MCP servers into a single, typed, observable surface a coding-agent host can rely on.

Three properties that matter most

  1. Typed surfaces. Every tool call is validated against a schema before it is forwarded. Schema-incompatible calls are rejected with a structured error.
  2. Plugin isolation. Plugins run in their own runtime boundary. A plugin crash never crashes the router. A plugin hang never blocks other plugins.
  3. Structured telemetry. Every routing decision is an event. Operators read events; they do not infer from logs.

What the router refuses to do

The router does not silently re-route calls when a tool is unavailable. It does not auto-discover tools from the network. It does not paper over schema breaks. These refusals are deliberate. Each one preserves an invariant a downstream agent can rely on.

Reliability mechanisms
  • Typed tool surfaces — every routed tool has a schema, and the router refuses calls that do not match.
  • Plugin isolation — broken plugins fail closed instead of bringing down the router.
  • Lifecycle hygiene — startup, shutdown, and restart are explicit operations with observable transitions.
  • Structured telemetry — every routing decision emits a structured event that an operator can audit later.
  • Capability allow-list — tools are opt-in per host, not opt-out, so the default surface is small.
Tradeoffs
  • Typed surfaces force schema discipline up-front; ad-hoc tools are slower to ship.
  • Plugin isolation costs a small amount of latency per call; the benefit is independence from a single tool's bugs.
  • Local-first operation means operators trade managed convenience for control over the trust boundary.
What this proves

Demonstrates platform-level engineering judgment: how to take a fast-moving protocol and wrap it with the routing, isolation, and observability properties that production tool-use depends on.

Privacy notes

This case study describes patterns common to MCP infrastructure. No proprietary registry, plugin runtime, or backend is named. Schemas referenced are public MCP types.