MCP Router
A reliable tool-routing and plugin/runtime platform for the Model Context Protocol — typed surfaces, lifecycle hygiene, plugin isolation, and observable failures.
Coding agents talk to tools through MCP servers. As the number of MCP servers grows, the surface area grows with them — auth, lifecycle, version drift, partial failures, and noisy logs. Without a router, every host has to solve these problems on its own.
- Must remain protocol-faithful so any conforming MCP client works without bespoke adapters.
- Must isolate plugin/runtime failures so one broken server does not poison the rest.
- Must surface failures as structured events, not as silent timeouts.
- Must support local-first operation; cloud is optional, not required.
The MCP router exists because tool surfaces multiply faster than humans can keep track. The router turns the messy, growing set of MCP servers into a single, typed, observable surface a coding-agent host can rely on.
Three properties that matter most
- Typed surfaces. Every tool call is validated against a schema before it is forwarded. Schema-incompatible calls are rejected with a structured error.
- Plugin isolation. Plugins run in their own runtime boundary. A plugin crash never crashes the router. A plugin hang never blocks other plugins.
- Structured telemetry. Every routing decision is an event. Operators read events; they do not infer from logs.
What the router refuses to do
The router does not silently re-route calls when a tool is unavailable. It does not auto-discover tools from the network. It does not paper over schema breaks. These refusals are deliberate. Each one preserves an invariant a downstream agent can rely on.
- Typed tool surfaces — every routed tool has a schema, and the router refuses calls that do not match.
- Plugin isolation — broken plugins fail closed instead of bringing down the router.
- Lifecycle hygiene — startup, shutdown, and restart are explicit operations with observable transitions.
- Structured telemetry — every routing decision emits a structured event that an operator can audit later.
- Capability allow-list — tools are opt-in per host, not opt-out, so the default surface is small.
- Typed surfaces force schema discipline up-front; ad-hoc tools are slower to ship.
- Plugin isolation costs a small amount of latency per call; the benefit is independence from a single tool's bugs.
- Local-first operation means operators trade managed convenience for control over the trust boundary.
Demonstrates platform-level engineering judgment: how to take a fast-moving protocol and wrap it with the routing, isolation, and observability properties that production tool-use depends on.
This case study describes patterns common to MCP infrastructure. No proprietary registry, plugin runtime, or backend is named. Schemas referenced are public MCP types.