metis

Deployment Shape

Status: Draft v1 — recommendation, awaiting owner sign-off Last updated: 2026-05-13

Resolves the architectural fork in the project strategy (private) and the open question in §6.1: replacement agent vs. transparent gateway vs. hybrid. This spec is the recommendation and the rationale; the the project strategy (private) edits land only after sign-off.


1. Decision

Build the hybrid: gateway first, agent upgrade second.

Concretely:

  1. Phase 2 wedge — ship a transparent HTTP gateway (apps/gateway/) that speaks OpenAI-shape (and later Anthropic-shape) inbound, routes through the existing engine and adapters, and tracks cost per API key. The buyer flips one env var; their devs keep using Claude Code / Cursor / Codex / Continue without behavior changes. See gateway.md.
  2. Phase 3+ — continue investing in the replacement agent (CLI / TUI / future desktop) as the upgrade path. Skills, bounded memory, the context assembler, learned routing, and agent-internal delegation are the high-ceiling features that the gateway form factor cannot deliver. They become “Metis Pro” — what a buyer adopts after the gateway has already proved savings on their workload.
  3. Shared substrate — both deployments compose the same metis-core library (canonical IR, routing engine, adapters, pricing, trace store, memory). The gateway and the agent are different front doors to the same engine; they do not fork the codebase.
  4. Not on the table — gateway-only (caps the ceiling and walks away from work already built) and agent-only (caps the floor and ignores the dominant adoption-friction risk in the project strategy (private)).

The remainder of this spec is the survey and effort math behind that recommendation.


2. The three deployment shapes

2.1 Agent-only

The current build trajectory: a replacement coding agent (CLI today, TUI shipped, desktop later) where Metis owns the entire loop — routing, context, tools, memory, skills.

2.2 Gateway-only

Metis becomes a transparent HTTP proxy: OpenAI-shape (and Anthropic-shape) inbound, provider-native outbound via the existing adapter set. Devs keep using whatever agent they already use. The buyer changes one env var.

2.3 Hybrid (gateway first → agent upgrade)

Ship the gateway as the foot-in-the-door. Use it to prove savings on the buyer’s actual workload, charge for the cost-dashboard. Position the replacement agent as the upgrade — the way buyers reach the context + skills + memory levers once the gateway has already paid for itself.


3. The reference shapes — what LiteLLM, Portkey, and Helicone actually are

Surveyed 2026-05-13. State for LiteLLM open-issue claims sourced from docs/market-research/03-routing-layers.md (verified 2026-05-09); install behavior and config shape sourced from each project’s quickstart docs.

3.1 LiteLLM proxy (BerriAI, 46k★, MIT-ish)

3.2 Portkey (Portkey AI, 12k★ OSS gateway + SaaS)

3.3 Helicone (Helicone, 6k★ OSS + SaaS, YC W23)

3.4 The pattern across all three

All three intercept HTTP only. None wraps an agent loop, owns context assembly, loads skills, or curates memory. None ships bounded memory or learned routing. All three have either documented or strongly-suspected fidelity gaps on Anthropic-native blocks — and the one that documents the issues most honestly (LiteLLM) has 8+ open issues in May 2026 on exactly those surfaces.

This is the wedge for a Metis gateway, even if the product is “yet another OpenAI-shape proxy”: lossless canonical IR is invisible on the marketing page but load-bearing for buyers running Anthropic models through tools (which is everyone using Claude Code). A gateway that doesn’t drop thinking blocks on retry, doesn’t collapse tool_use.input across providers, and places cache_control correctly on Bedrock would be the only one in the lane that does.


4. Effort estimates

4.1 Minimum gateway prototype

What it has to do:

What’s reusable from metis-core (so we’re not building from scratch):

Component Status Reuse for gateway
Canonical IR shipped core
Adapters (Anthropic, OpenAI, OpenRouter) + streaming shipped core
Routing engine (7-slot chain, availability, validation) shipped core; gateway uses primarily rule / workspace-default / global-default slots
Pricing + cost stamping shipped core; add per-key attribution
Trace store + analytics API shipped core; extend with gateway-key dimension
Tool-id map shipped needed at gateway scope (per-request, not per-session)

What’s missing (the actual gateway build):

Estimate. ~80% of the code already exists. The new surface is bounded: an inbound translator (mirror of an existing outbound translator), an SSE serializer, a stateless harness, and a per-key auth bolt-on.

What this estimate explicitly excludes: configured-rule policy completion (already on the Phase 2 roadmap; the gateway can ship with rule slot still stubbed), prompt-cache breakpoint optimization in the gateway path (Phase 3), team/RBAC (post-pilot), multi-tenant hardening (post-pilot).

4.2 Replacement-agent polish to ship to a buyer

What it has to do: be presentable as a coding agent a buyer’s devs could actually adopt instead of Claude Code or Cursor.

Surveyed gaps (from apps/cli/src/metis_cli/tui/app.py — currently 557 lines, single file — and from the “What’s NOT built” list in AGENTS.md):

Gap Effort
TUI: multi-session pane, sidebar, cost panel, model picker UI, settings UI, tool-confirmation UI 2–3 weeks
Real tool-confirmation handler (replace AutoAllowHandler — currently auto-approves writes and shell) 1 week
Onboarding flow: first-run wizard, model discovery, .env setup, sample skills 1–2 weeks
Public docs / install scripts / “5-minute setup” page 1 week
Savings benchmark + demo workload (called out as the biggest gap in the project strategy (private)) 2–3 weeks
Context assembler design + spec + first implementation (the biggest cost lever; currently architecture-diagram-only) separate, large — 4–6 weeks

Estimate. Polish-only (ignoring context assembler, which is a build, not polish): 5–8 engineer-weeks. With context assembler — which is what makes the replacement-agent ceiling story real: 10–14 weeks.

Critical caveat: polish does not change the §3 adoption ceiling. A perfectly polished replacement agent still has the “make your devs switch tools” friction that’s the #1 reason B2B dev tool buys don’t land. Polish makes the agent shippable; it does not make it adopted.


5. Why hybrid wins on the math

5.1 Surface-area math

The gateway and the agent share the same metis-core substrate. Counting the build surface that doesn’t double:

Surface Agent Gateway Shared
Canonical IR
Adapters
Routing engine
Pricing / cost
Trace store / analytics
Memory store
Skill loading
Context assembler
Tool dispatcher
Session manager
TUI / CLI
Inbound HTTP translators (OpenAI / Anthropic shape)
SSE serializer
Stateless gateway harness
Per-key auth / cost attribution

The agent-specific column is what’s already mostly built. The gateway-specific column is bounded and small (~5–8 weeks). Doing both costs roughly agent-polish + gateway-MVP, not 2×.

5.2 GTM math

Shape Time-to-first-savings-on-buyer-workload Sale velocity Per-account ceiling
Agent-only weeks (after dev adoption) slow high (full three-lever story)
Gateway-only hours (env var flip) fast medium (model selection + cache only)
Hybrid hours (gateway) → weeks (agent upsell) fast floor, high ceiling high

The gateway gives Metis the artifact the project doesn’t yet have and needs most: proof of savings on the buyer’s actual workload (the project strategy (private): “This is currently the biggest gap between ‘the architecture should work’ and ‘we can show it works.’“).

5.3 Risk math


6. What this means for adjacent open questions


7. Out of scope for this spec


8. Open questions the owner should resolve before this lands

  1. Gateway-key model. Single-tenant (one key per deployed Metis instance) vs. multi-tenant from day one. The spec assumes single-tenant; multi-tenant is a non-trivial bolt-on (RBAC, tenant isolation in the trace store) and probably wants its own design pass.
  2. Inbound surface scope. OpenAI-shape only for MVP, or OpenAI-shape + Anthropic-shape (/v1/messages) together? The latter eats 1–2 extra weeks but is the surface every Claude Code user needs.
  3. Naming. “Metis Gateway” vs. “Metis Proxy” vs. something else. Affects positioning; defer until product copy starts being written.
  4. Where does the gateway run. Same process as metis serve (one binary, two surfaces) or separate apps/gateway/? Recommendation: separate package, can share metis-core, but operationally distinct because the security/threat model is very different (gateway is a public-ish surface; the current metis serve is loopback-only).

9. Sign-off

This spec is the recommendation only. It does not retire the project strategy (private) or land any the project strategy (private) edits. Those follow on owner sign-off.

When signed off, the the project strategy (private) edits queued are: