Status: Draft v1 — recommendation, awaiting owner sign-off Last updated: 2026-05-13
Resolves the architectural fork in the project strategy (private) and the open question in §6.1: replacement agent vs. transparent gateway vs. hybrid. This spec is the recommendation and the rationale; the the project strategy (private) edits land only after sign-off.
Build the hybrid: gateway first, agent upgrade second.
Concretely:
apps/gateway/) that speaks OpenAI-shape (and later Anthropic-shape) inbound, routes through the existing engine and adapters, and tracks cost per API key. The buyer flips one env var; their devs keep using Claude Code / Cursor / Codex / Continue without behavior changes. See gateway.md.metis-core library (canonical IR, routing engine, adapters, pricing, trace store, memory). The gateway and the agent are different front doors to the same engine; they do not fork the codebase.The remainder of this spec is the survey and effort math behind that recommendation.
The current build trajectory: a replacement coding agent (CLI today, TUI shipped, desktop later) where Metis owns the entire loop — routing, context, tools, memory, skills.
synthesis.md only works here.Metis becomes a transparent HTTP proxy: OpenAI-shape (and Anthropic-shape) inbound, provider-native outbound via the existing adapter set. Devs keep using whatever agent they already use. The buyer changes one env var.
OPENAI_BASE_URL; devs don’t notice; savings show up on the dashboard within hours. This is the GTM motion LiteLLM / Portkey / Helicone all run.synthesis.md — lossless canonical IR that round-trips Anthropic’s cache_control / thinking / tool_use / citations — is real in the gateway shape (see §3.4 below) but it doesn’t differentiate on the dashboard.Ship the gateway as the foot-in-the-door. Use it to prove savings on the buyer’s actual workload, charge for the cost-dashboard. Position the replacement agent as the upgrade — the way buyers reach the context + skills + memory levers once the gateway has already paid for itself.
Surveyed 2026-05-13. State for LiteLLM open-issue claims sourced from docs/market-research/03-routing-layers.md (verified 2026-05-09); install behavior and config shape sourced from each project’s quickstart docs.
uv tool install 'litellm[proxy]' + litellm --model <name>; proxy listens on 0.0.0.0:4000. Application sets base_url=http://0.0.0.0:4000 and a dummy api_key. Inbound supports both OpenAI-shape (POST /chat/completions, /completions, /embeddings) and Anthropic-shape (POST /v1/messages).fallbacks, num_retries, request_timeout, routing_strategy (simple-shuffle / least-busy / usage-based / latency-based), per-deployment rpm / tpm for weighted load balance. Virtual keys for per-team budgets.docs/market-research/03-routing-layers.md: #27512 (Anthropic Messages retry drops thinking blocks), #27469 (tool_call function.arguments lost in OpenAI→Anthropic conversion), #15601 (thinking blocks missing on tool-call requests), #26916 / #24985 (thinking blocks collapsed to text in multi-turn), #26625 / #20418 / #20485 (Bedrock + Vertex cache_control placement broken), #26937 (citations on Bedrock Converse not supported). The bug-of-the-week pattern is exactly the surface Metis treats as load-bearing.npx @portkey-ai/gateway, or use the SaaS. Application changes base URL and adds headers. SDK install optional.strategy.mode (fallback / loadbalance / single) + targets array supporting passthrough, provider, override_params. Conditional rules, A/B, load balancing. Sophisticated and well-documented.https://ai-gateway.helicone.ai, set HELICONE_API_KEY. Drop-in. Or self-host.All three intercept HTTP only. None wraps an agent loop, owns context assembly, loads skills, or curates memory. None ships bounded memory or learned routing. All three have either documented or strongly-suspected fidelity gaps on Anthropic-native blocks — and the one that documents the issues most honestly (LiteLLM) has 8+ open issues in May 2026 on exactly those surfaces.
This is the wedge for a Metis gateway, even if the product is “yet another OpenAI-shape proxy”: lossless canonical IR is invisible on the marketing page but load-bearing for buyers running Anthropic models through tools (which is everyone using Claude Code). A gateway that doesn’t drop thinking blocks on retry, doesn’t collapse tool_use.input across providers, and places cache_control correctly on Bedrock would be the only one in the lane that does.
What it has to do:
POST /v1/chat/completions (OpenAI-shape inbound; the universal contract every client speaks).CanonicalRequest (per provider-adapter-contract.md §3).RoutingEngine (per routing-engine.md).metis_core.adapters package.CanonicalResponse back to OpenAI-shape and return (or stream as OpenAI-shape SSE deltas).tenant_id / gateway_key_id to the trace events).What’s reusable from metis-core (so we’re not building from scratch):
| Component | Status | Reuse for gateway |
|---|---|---|
| Canonical IR | shipped | core |
| Adapters (Anthropic, OpenAI, OpenRouter) + streaming | shipped | core |
| Routing engine (7-slot chain, availability, validation) | shipped | core; gateway uses primarily rule / workspace-default / global-default slots |
| Pricing + cost stamping | shipped | core; add per-key attribution |
| Trace store + analytics API | shipped | core; extend with gateway-key dimension |
| Tool-id map | shipped | needed at gateway scope (per-request, not per-session) |
What’s missing (the actual gateway build):
Message / ToolDefinition / system-prompt extraction (system role hoist, tool-result re-merge from role: tool → user with ToolResult blocks). Mirror the egress translator in adapters/openai.py but in reverse.data: {"choices": [{"delta": ...}]}\ndata: [DONE]\n shape. The hard parts are tool-call deltas (OpenAI’s tool_calls[].function.arguments are streamed as JSON-string fragments) and reasoning/thinking summaries.SessionManager — the gateway is per-request; the client owns the tool loop and re-submits with tool_result blocks. We need a thin equivalent that runs routing + adapter call + cost stamping without spinning up a workspace session.apps/server/src/metis_server/tokens.py as a starting point; add a gateway_keys table keyed on the inbound Authorization: Bearer header.POST /v1/messages inbound. Anthropic-shape; closer to the canonical IR (tool_use blocks already round-trip cleanly), so this should be a smaller add than the OpenAI side.Estimate. ~80% of the code already exists. The new surface is bounded: an inbound translator (mirror of an existing outbound translator), an SSE serializer, a stateless harness, and a per-key auth bolt-on.
POST /v1/messages (Anthropic-shape inbound) and tool-call round-trip end-to-end through Claude Code: +1–2 weeks.What this estimate explicitly excludes: configured-rule policy completion (already on the Phase 2 roadmap; the gateway can ship with rule slot still stubbed), prompt-cache breakpoint optimization in the gateway path (Phase 3), team/RBAC (post-pilot), multi-tenant hardening (post-pilot).
What it has to do: be presentable as a coding agent a buyer’s devs could actually adopt instead of Claude Code or Cursor.
Surveyed gaps (from apps/cli/src/metis_cli/tui/app.py — currently 557 lines, single file — and from the “What’s NOT built” list in AGENTS.md):
| Gap | Effort |
|---|---|
| TUI: multi-session pane, sidebar, cost panel, model picker UI, settings UI, tool-confirmation UI | 2–3 weeks |
Real tool-confirmation handler (replace AutoAllowHandler — currently auto-approves writes and shell) |
1 week |
Onboarding flow: first-run wizard, model discovery, .env setup, sample skills |
1–2 weeks |
| Public docs / install scripts / “5-minute setup” page | 1 week |
| Savings benchmark + demo workload (called out as the biggest gap in the project strategy (private)) | 2–3 weeks |
| Context assembler design + spec + first implementation (the biggest cost lever; currently architecture-diagram-only) | separate, large — 4–6 weeks |
Estimate. Polish-only (ignoring context assembler, which is a build, not polish): 5–8 engineer-weeks. With context assembler — which is what makes the replacement-agent ceiling story real: 10–14 weeks.
Critical caveat: polish does not change the §3 adoption ceiling. A perfectly polished replacement agent still has the “make your devs switch tools” friction that’s the #1 reason B2B dev tool buys don’t land. Polish makes the agent shippable; it does not make it adopted.
The gateway and the agent share the same metis-core substrate. Counting the build surface that doesn’t double:
| Surface | Agent | Gateway | Shared |
|---|---|---|---|
| Canonical IR | — | — | ✓ |
| Adapters | — | — | ✓ |
| Routing engine | — | — | ✓ |
| Pricing / cost | — | — | ✓ |
| Trace store / analytics | — | — | ✓ |
| Memory store | ✓ | — | — |
| Skill loading | ✓ | — | — |
| Context assembler | ✓ | — | — |
| Tool dispatcher | ✓ | — | — |
| Session manager | ✓ | — | — |
| TUI / CLI | ✓ | — | — |
| Inbound HTTP translators (OpenAI / Anthropic shape) | — | ✓ | — |
| SSE serializer | — | ✓ | — |
| Stateless gateway harness | — | ✓ | — |
| Per-key auth / cost attribution | — | ✓ | — |
The agent-specific column is what’s already mostly built. The gateway-specific column is bounded and small (~5–8 weeks). Doing both costs roughly agent-polish + gateway-MVP, not 2×.
| Shape | Time-to-first-savings-on-buyer-workload | Sale velocity | Per-account ceiling |
|---|---|---|---|
| Agent-only | weeks (after dev adoption) | slow | high (full three-lever story) |
| Gateway-only | hours (env var flip) | fast | medium (model selection + cache only) |
| Hybrid | hours (gateway) → weeks (agent upsell) | fast floor, high ceiling | high |
The gateway gives Metis the artifact the project doesn’t yet have and needs most: proof of savings on the buyer’s actual workload (the project strategy (private): “This is currently the biggest gap between ‘the architecture should work’ and ‘we can show it works.’“).
synthesis.md (TUI lane is crowded; the differentiating mechanics — memory + skills + fingerprint routing — take quarters to build legibly).analytics-api.md §4.7 already specifies becomes a contract Metis can hold up to a buyer with their own numbers. No synthetic workload needed.gateway.md)./v1/messages) together? The latter eats 1–2 extra weeks but is the surface every Claude Code user needs.metis serve (one binary, two surfaces) or separate apps/gateway/? Recommendation: separate package, can share metis-core, but operationally distinct because the security/threat model is very different (gateway is a public-ish surface; the current metis serve is loopback-only).This spec is the recommendation only. It does not retire the project strategy (private) or land any the project strategy (private) edits. Those follow on owner sign-off.
When signed off, the the project strategy (private) edits queued are:
deployment-shape.md.”deployment-shape.md.”