Status: v1 — shipped. Captures the OpenAI- and Anthropic-shape inbound surface in apps/gateway/, live-smoked end-to-end on 2026-05-14 at ~$0.0002 / 4 calls.
Last updated: 2026-05-14
The HTTP gateway is the transparent-proxy surface from
deployment-shape.md §1. It accepts OpenAI- or Anthropic-shape requests from external clients (Claude Code, Cursor, Codex, Continue, custom apps), routes via the existing engine, calls the existing adapters, and returns provider-shape responses — losslessly preserving Anthropic-native blocks where possible.This spec depends on:
canonical-message-format.md—Message,ContentBlockvariants,ToolDefinition.provider-adapter-contract.md—CanonicalRequest,CanonicalResponse, streaming events, error classes.routing-engine.md— the 7-slot chain.server-api.md— the gateway is a sibling HTTP app, not an extension ofmetis serve.event-bus-and-trace-catalog.md— gateway calls emit the samellm.call_*/route.decided/turn.completedevents as agent calls, with two additive payload fields (gateway_key_id,inbound_shape).analytics-api.md—/analytics/cost?group_by=gateway_keyand/analytics/by_keyconsume the additive fields above.
Give the buyer Metis’s model-selection + cost-attribution + lossless-canonical-IR value without making their devs switch tools. Devs keep using whatever client they already use (typically Claude Code or Cursor); operations flips one env var (OPENAI_BASE_URL / ANTHROPIC_BASE_URL) and adds a Metis-issued API key. Every LLM call now flows through Metis, gets routed, gets cost-attributed, and writes a trace.
The gateway is a per-request stateless harness: routing decision + adapter call + response translation + cost stamping. The agent loop stays in the client. Multi-turn tool use happens through the client re-submitting follow-up requests with tool_result blocks; the gateway is stateless across requests within the same agent loop.
POST /v1/chat/completions, sync + SSE streaming). The universal contract.POST /v1/messages, sync + SSE streaming) — closer to canonical IR; smaller translation gap; required to keep Claude Code clients native.RoutingEngine. The inbound model field is interpreted as a per-message override (§5.3).llm.call_completed events are stamped with gateway_key_id so the analytics-api.md rollups can split cost by key.gateway_key_id, inbound_shape) carry the gateway-specific dimensions (§6).thinking, cache_control, tool_use, tool_result, citations.These are the bright lines that separate the gateway from the replacement agent. The gateway does not assume them, and must not be extended to add them — those features belong in the agent.
MEMORY.md / USER.md are agent-loop artifacts. The gateway is workspace-aware (a key is scoped to one workspace per §3.3) but doesn’t read or write the workspace’s .metis/ files.tool_use blocks back to the client; the client executes the tool and re-submits a follow-up request with the tool_result.These non-features are load-bearing for the deployment-shape.md hybrid argument: they are the upgrade reasons that make the agent worth buying after the gateway.
| Endpoint | Method | Inbound shape | Description |
|---|---|---|---|
/v1/chat/completions |
POST | OpenAI | Standard chat completion; supports stream: true (SSE). |
/v1/messages |
POST | Anthropic | Anthropic Messages API; supports stream: true (SSE). |
/healthz |
GET | — | Liveness; no auth. |
Deferred to v1.1+: /v1/models (OpenAI-shape model list), /v1/embeddings, /v1/completions (legacy), tool-confirmation REST surface, batch APIs, file uploads.
The gateway defaults to loopback bind (127.0.0.1). Post-Wave-13 this
is a default, not a constraint: the operator opts into a public
bind explicitly via --host 0.0.0.0 once the hardening Wave 13 + Wave
11 layered on top of v1 is wired (gateway-hardening.md §2.1):
| Hardening layer | Status | Owner |
|---|---|---|
| Per-key + per-IP rate limiting | Shipped (Wave 11, opt-in via RateLimitConfig.enabled=True) |
In-process |
| Audit-log export | Shipped (Wave 12, metis audit export) |
In-process |
| Connection-rate cap | Shipped (Wave 13, default 1000 concurrent) | In-process |
| TLS termination | Shipped (Wave 13, --tls-cert/--tls-key for in-process; sidecar still recommended) |
In-process or sidecar |
| Key rotation / revocation | Shipped (Wave 10, metis gateway rotate-key / revoke-key) |
In-process |
| WAF / volumetric DDoS | Buyer-owned (edge CDN / cloud LB) | Upstream |
Pre-Wave-13, run_gateway() silently rewrote any non-loopback bind to
127.0.0.1 because per-key rate limits and audit logging hadn’t shipped
yet. Both have since landed; the rewrite is removed. The operator
opts in deliberately and gets a one-time WARN log line at boot
summarizing whether in-process TLS / rate-limit middleware are on so
the perimeter checklist stays honest.
Each inbound request carries a gateway-issued bearer token:
Authorization: Bearer gw_<ulid> (the standard OpenAI header).x-api-key: gw_<ulid> (the standard Anthropic header). The handler also accepts Authorization: Bearer ... as a fallback so generic SDKs can hit /v1/messages without special-casing the auth header.Tokens are issued out-of-band via metis gateway issue-key --name "<display>" --workspace "<path>" [--allow-models ...] [--daily-cap-usd ...] [--user <id>] [--team <id>]. The plaintext is printed once; only the SHA-256 hex digest is persisted in the keystore (~/.metis/gateway/keys.json by default, mode 0o600). The keystore records:
| Field | Source | Notes |
|---|---|---|
key_id |
gk_<ulid> minted at issuance |
Stable analytics identifier; stamped on every trace event. |
secret_hash |
SHA-256(token) | The plaintext token is never stored or recoverable. |
name |
--name |
Display label. |
workspace_path |
--workspace |
Exactly one workspace per key in v1 (§11). |
allowed_models |
--allow-models (optional) |
Tuple of canonical model ids; non-conforming routing falls through. |
daily_cap_usd |
--daily-cap-usd (optional) |
Per-key daily spend cap. Decimal, must be > 0. Hard breaker (§6.4); soft alerts at 80% / 95%. |
monthly_cap_usd |
--monthly-cap-usd (optional) |
Per-key calendar-month spend cap (UTC). Same enforcement model as daily_cap_usd. |
user_id |
--user (optional) |
Stable per-developer identity tag (see multi-user.md §4.2). ^[a-z0-9_-]+$. Existing keys with None keep working. |
team_id |
--team (optional) |
Stable team identity tag (multi-user.md §4.2). ^[a-z0-9_-]+$. |
Authentication: the handler hashes the inbound token, looks it up in the keystore, and returns 401 on miss. The handler then projects the resolved key onto a request-scoped Identity (gateway_key_id, workspace_path, user_id, team_id — multi-user.md §3.2 calls this the Principal); the harness reads only the Identity, never the raw GatewayKey for stamping. The gateway_key_id, user_id, and team_id are recorded on every outbound llm.call_completed and turn.completed event so analytics-api.md §4.8 (and /analytics/by_team) can roll up cost by key / user / team. No PII flows into the key record — users.json carries plaintext (multi-user.md §3.3) and is a separate file; the keystore only references identity tags by id.
| Client | Shape | Action |
|---|---|---|
OpenAI SDK (openai-python, openai-node) |
OpenAI | Set base_url / OPENAI_BASE_URL and OPENAI_API_KEY to the gateway. |
| Anthropic SDK | Anthropic | Set base_url / ANTHROPIC_BASE_URL and the x-api-key header to the gateway key. |
| Claude Code | Anthropic | Same as Anthropic SDK. The buyer flips ANTHROPIC_BASE_URL org-wide. |
| Cursor (closed) | OpenAI-compat or Anthropic-compat | Configurable in Cursor settings; flip the API base. |
| OpenCode / Cline / Continue / Goose | OpenAI-compat or per-provider | Configurable; flip the API base. |
| Custom internal app | Either | Same. |
The compatibility bar for v1 is “Claude Code → gateway → Anthropic API end-to-end, including tool use, thinking blocks, and prompt caching.” The Wave-5 smoke (scripts/smoke_*) exercises that workload.
This section is the load-bearing fidelity contract. It is the difference between Metis as a gateway and LiteLLM as a gateway. Every rule below corresponds to a bug we explicitly contract against.
Implemented in translators.py::parse_openai_request.
| OpenAI field | Canonical field | Notes |
|---|---|---|
messages[].role: "system" |
system_prompt (string) |
Hoisted out of the message list per canonical-format §3.2. Multiple system messages are concatenated with \n\n. |
messages[].role: "user" with content: string |
Message(role=USER, content=[TextBlock(...)]) |
Trivial. |
messages[].role: "user" with content: list (multi-part) |
Message(role=USER, content=[TextBlock or ImageBlock]) |
Translate type: "text" → TextBlock, type: "image_url" → ImageBlock (URL or base64). |
messages[].role: "assistant" |
Message(role=ASSISTANT, content=[...]) |
If tool_calls present, generate one ToolUseBlock per call (see §4.3 for id mapping). |
messages[].role: "tool" with tool_call_id and content |
merged into next Message(role=USER, content=[ToolResultBlock(...)]) per canonical-format §3.3 |
OpenAI sends tool results as role: "tool"; canonical merges them into the next user turn as ToolResultBlock. Multiple consecutive tool messages collapse into one user message with multiple ToolResultBlocks. |
tools |
list[ToolDefinition] |
function.parameters (JSON Schema) → ToolDefinition.input_schema. |
tool_choice |
ToolChoice (in CanonicalRequest) |
"auto" / "none" / {type: "function", function: {name}} map to canonical equivalents. |
stream: true |
drives SSE outbound (§4.6) | The harness’s stream() path is wired separately from call(). |
temperature, max_tokens, stop |
temperature, max_output_tokens, stop_sequences |
Direct. |
response_format: {type: "json_schema", ...} |
output_schema |
Provider support is gated by AdapterCapabilities.supports_structured_output. |
stream_options.include_usage |
OpenAIInboundRequest.include_usage |
Threads through render_openai_sse_stream so the final chunk includes usage when requested. |
Implemented in endpoints/anthropic.py::parse_anthropic_request. This is the simpler direction — canonical IR is Anthropic-shaped at heart.
| Anthropic field | Canonical field | Notes |
|---|---|---|
system (string or list of blocks) |
system_prompt (+ optionally system_prompt_volatile) |
Block list flattened to string with cache_control markers preserved per canonical-format §4.1. The split between stable and volatile segments uses Anthropic’s cache_control boundary if the client supplied one. |
messages[] |
Message (per role) |
Direct. |
messages[].content[] blocks: text, image, tool_use, tool_result, thinking, redacted_thinking, document (citations) |
TextBlock, ImageBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock, RedactedThinkingBlock, DocumentBlock |
Lossless 1:1. The canonical IR was designed to be a superset of Anthropic’s. |
tools[] |
list[ToolDefinition] |
Direct (canonical IR uses Anthropic’s tool shape). |
tool_choice |
ToolChoice |
Direct. |
metadata.user_id |
dropped | Not persisted in v1. |
stream: true |
drives SSE outbound (§4.6) | SSE matches Anthropic’s event: ... format. |
Tool-call ids differ across providers (OpenAI uses call_..., Anthropic uses toolu_...). When the inbound shape and the outbound provider disagree, the handler builds a per-request ToolIdMap (metis_core.adapters.tool_id_map) and round-trips ids through it.
The hazard LiteLLM #27469 documents — tool_call.function.arguments lost in OpenAI→Anthropic conversion — is the failure case the translator explicitly contracts against:
function.arguments (which OpenAI emits as a string, possibly streamed in fragments) is treated as opaque JSON; parsed to dict and placed in ToolUseBlock.input. Parse failure → 400 invalid_request_error.tool_call_id from a prior turn, the client owns the round-trip (which is fine: the client also stored the assistant’s prior reply, so it sees the same id either side).cache_control placement (the LiteLLM #26625 hazard)Anthropic supports cache_control: {type: "ephemeral"} markers on system blocks, message content blocks, and tools. Placement matters: misplaced markers turn into cache misses, which is the cost lever inverted. The canonical IR carries cache_control on TextBlock, ToolDefinition, and (per canonical-format §4.1) system-prompt blocks.
The Anthropic-inbound translator preserves cache_control markers verbatim because both sides speak the same shape. The OpenAI-inbound translator accepts cache_control only via extra_body (it’s not in OpenAI’s standard schema) and places markers exactly where the canonical IR puts them. When the chosen adapter does not support cache_control (per AdapterCapabilities.supports_prompt_caching), the marker is dropped silently — the client gave a hint, not a contract.
ThinkingBlock and RedactedThinkingBlock are first-class in the canonical IR. The retry layer in metis_core.adapters.retry preserves them across attempts. The gateway never collapses thinking to text (LiteLLM #26916). When the inbound shape is OpenAI, thinking content is dropped from the outbound JSON unless the client opted in via extra_body.include_thinking: true; never re-typed.
Outbound SSE format depends on the inbound shape:
data: {"id": ..., "choices": [{"delta": {...}, "index": 0}], "model": ..., "object": "chat.completion.chunk"}\n\n followed by data: [DONE]\n\n. Implemented in translators.py::render_openai_sse_stream. The hard part is tool_calls[].function.arguments deltas — OpenAI streams the arguments JSON as raw text fragments; the translator emits each canonical ToolUseInputDelta as a fragment with index matching the position in the OpenAI tool_calls array.event: message_start, event: content_block_start, etc.). Implemented in endpoints/anthropic.py::render_sse_stream.The streaming path primes the first event before committing to a 200 response, so routing-time failures (RoutingFailedError, ModelNotAllowedError) surface as JSON error bodies rather than 200 SSE streams. After the first event, errors mid-stream terminate the response without further events (the client sees an early EOF).
Client disconnect (TCP RST / closed read side) triggers the harness to cancel the in-flight adapter call (per provider-adapter-contract.md §5.4). The disconnect probe races the adapter task; on disconnect, the adapter’s cancel() is invoked and the harness raises ClientDisconnected. Partial token usage up to cancellation is still cost-stamped and traced. The HTTP handler returns a 499-style sentinel in case the socket is somehow still open.
SDK clients speak the bare model names their upstream provider expects — the Anthropic SDK sends claude-3-5-haiku-20241022 and rejects any anthropic: prefix because the real Anthropic API doesn’t take one; the OpenAI SDK sends gpt-4o-mini and rejects an openai: prefix for the same reason. Metis’s internal model id is always provider:name so the routing engine, the price table, and the registry all agree on a single key. Without an explicit bridge between the two namespaces, the gateway’s per_message_override slot can’t resolve the bare client name → routing falls through to global_default → cost is billed under whichever model that points at. The GA-readiness audit (§2.4) caught this on the canonical claude-haiku-4-5 workload: every call was routed and priced as anthropic:claude-sonnet-4-6, over-reporting cost ~6×.
The fix lives in harness.py::_normalize_inbound_model and runs once per request, immediately before registry.resolve_alias. Rules (first match wins):
haiku, sonnet, etc.) and canonical ids (anthropic:claude-haiku-4-5) pass through unchanged. This preserves the alias path the agent loop and the CLI already use.metis:// opt-out — metis://auto, metis://cheap, metis://opus are the documented “let routing decide” form and MUST NOT be prefixed.provider:name — any string containing : (other than the metis:// form) passes through; if the registry doesn’t know it, the routing chain falls through as documented in §5.3.:, not metis://: prepend the inbound shape’s provider prefix. The current shape→prefix map is {"openai": "openai", "anthropic": "anthropic"}; other shapes pass through unchanged so the chain falls through cleanly.The normalization is registry-aware on the alias step but otherwise unconditional: a bare name unknown to the registry still ends up prefixed so the routing trace records the buyer’s intent (and the price table can look up the canonical key) rather than stamping a bare string no analytics surface recognizes.
What does NOT change:
model string verbatim. SDKs that compare the echo against what they sent (some clients do) continue to work.translators.py for OpenAI, endpoints/anthropic.py for Anthropic) remain pure data parsers — they don’t depend on the registry. Normalization happens at the harness boundary because that’s where both the inbound shape and the registry are in scope.PriceTable.compute_cost is unchanged — it’s a strict canonical-id lookup. The pre-existing UnknownPricingModelError still surfaces if a buyer points at a model that’s registered for routing but missing from the price table.The 7-slot chain from routing-engine.md §4 all run; not all are meaningful in stateless gateway calls.
| Slot | Meaningful in gateway? | Notes |
|---|---|---|
per_message_override |
Yes — and dominant in practice (§5.3). | Client’s inbound model is interpreted as a per-message override. |
manual_sticky |
No | There is no “session” in the gateway path — each request is independent. Reports not_applicable. |
configured_rules |
Yes | The same ~/.metis/routing.yaml policy that drives the agent applies here. Buyer rules like “no Opus for marketing key” live here. |
pattern_recommendation |
Future | Pattern store may observe gateway traffic; not yet writing recommendations into the gateway chain. |
delegate_request |
No | Delegation is an in-loop primitive. Reports not_applicable. |
workspace_default |
Yes | Resolved from gateway_key.workspace_path against the policy. |
global_default |
Yes | The deployment’s catch-all. |
Each inbound request constructs a TurnContext with:
session_id = "gw_<ulid>", turn_id = "gt_<ulid>" (synthetic per-request ids so the trace events still join).session_active_model = None (no sticky state).workspace_default_model = None (the policy resolves it via workspaces.{key.workspace_path}.default).global_default_model = <runtime config>.per_message_override = registry.resolve_alias(parsed.model).policy = routing.policy (the same policy the agent uses).The chain runs to completion; one route.decided event is emitted per request as usual.
model fieldOpenAI- and Anthropic-shape clients send model: "<string>" and expect a model identifier they recognize. The gateway treats it as a per-message override in three forms, in priority order:
model: "metis://auto", model: "metis://cheap", model: "metis://opus". Resolved by registry.resolve_alias.provider:name: model: "anthropic:claude-opus-4-7". Identity-resolves.model: "gpt-4o" or model: "claude-opus-4-5". Resolved if the registry has it as an alias; otherwise normalized to canonical form (<inbound_shape>:<name>) per §4.8 — if the registry knows that canonical id, slot 1 wins on the prefixed form; if not, the chain falls through.Real-world consequence. Mainstream OpenAI / Anthropic SDKs always include model in the request body, so route.decided.chain reports policy=per_message_override, verdict=chose on every gateway request unless the client deliberately omits model. The rule, pattern, workspace_default, and global_default slots are unreachable in that mode. This is correct (the spec interprets model as a per-message override), but worth knowing when reading gateway traces.
Open question — “transparent mode” override. A future --ignore-inbound-model flag could ignore the inbound model field and let routing fall through to the rule / workspace / global slots, giving operators the cost-optimization magic-trick mode (client says “gpt-4o”, policy says “haiku for short prompts on this key”). The recommendation is to leave the default as-is: per-message override is the documented contract per gateway.md §5.3, and clients that want to delegate model choice can already pass model: "metis://auto". A flag is straightforward to add when a buyer specifically asks for it; treating it as opt-in keeps the default behavior predictable.
The existing capability gate in routing-engine.md §4.4 applies unchanged. If the inbound request uses tools and the routed adapter has supports_tools = false, the chain falls through to the next candidate. Hard failure returns 503 routing_failed.
After routing produces a chosen_model, the harness checks the key’s allowed_models tuple (when set). If the chosen model is not allowed, the harness raises ModelNotAllowedError, which the HTTP handler translates to 403 invalid_request_error (OpenAI) or 403 permission_error (Anthropic). The check happens after routing rather than feeding into capability validation so the trace records what would have been routed, surfacing the rejection cleanly.
Gateway requests emit the same catalog events as agent calls:
route.decided — exactly one per request.llm.call_started, llm.call_completed, llm.call_failed — one per provider call.turn.completed — one per request (a gateway request is one “turn” from a tracing perspective, even though there’s no session).Additive payload fields (no new event types, no breaking changes):
llm.call_completed.gateway_key_id: str | None — typed field on LLMCallCompleted (events/payloads.py). None for agent-loop traffic; set for gateway calls.llm.call_completed.inbound_shape: Literal["openai", "anthropic"] | None — same.llm.call_completed.user_id: str | None — typed field; stable principal id resolved from the gateway key at request entry (see multi-user.md §4.4). None for agent-loop traffic and pre-multi-user keys; rolls up under the null bucket in /analytics/cost?group_by=user.llm.call_completed.team_id: str | None — typed field; same null-bucket convention; drives /analytics/by_team (multi-user.md §5.2).turn.completed.user_id / turn.completed.team_id — typed fields on TurnCompleted (matching the LLMCallCompleted shape so analytics can roll up at either grain).turn.completed.gateway_key_id and inbound_shape — still stamped on the dict envelope at emit time (harness.py::_emit_turn_completed) until the typed extension on TurnCompleted lands for them too (§11 follow-on). The fields read identically by the analytics SQL.These additive fields drive analytics-api.md §4.1 (group_by=gateway_key) and §4.8 (/analytics/by_key), and the new group_by=user / group_by=team / /analytics/by_team surfaces in multi-user.md §5. Existing consumers ignore unknown fields.
Two additional event types fire from the gateway’s per-request auth path when a key carries a daily_cap_usd or monthly_cap_usd. Both are pseudonymous (stable ids, no plaintext PII).
| Event type | When | Payload (new) |
|---|---|---|
quota.alert |
Spend on the key/user/team is in [80%, 95%) (severity="warning") or [95%, 100%) (severity="critical") of the configured cap. One event per (request, scope) — the same scope on a later request fires again, but the same scope twice in one request does not. |
scope, severity, current_usd, limit_usd, percentage, gateway_key_id?, user_id?, team_id? |
gateway.quota_exceeded |
Spend has reached the cap (≥100%). Emitted alongside the 429 response described below — no quota.alert is also emitted in this case. | scope, current_usd, limit_usd, inbound_shape, gateway_key_id?, user_id?, team_id? |
scope is one of key_daily, key_monthly, user_daily, user_monthly, team_daily, team_monthly (multi-user.md §5.1) — v1 ships key_daily and key_monthly only; user/team scopes land when users.json / teams.json do.
When the hard cap fires, the HTTP layer returns 429 with the documented body shape:
{
"error": {
"code": "quota_exceeded",
"identity": "key",
"scope": "key_daily",
"limit_usd": "1.00",
"current_usd": "1.50",
"type": "rate_limit_error",
"message": "key_daily cap of $1.00 hit ($1.50 spent)"
}
}
The shape is the same for OpenAI and Anthropic inbound clients; only the trailing type discriminator (always rate_limit_error here) is wired for shape parity. The check runs before routing or adapter invocation, so a capped identity never burns provider-side cost on a rejected request.
Soft alerts and the hard breaker share the same SQL projection (AnalyticsStore.cost(group_by=...)-shaped query) running once per request through a per-request RequestQuotaCache. Daily windows reset at UTC midnight; monthly windows at UTC first-of-month.
The gateway does not persist canonical messages by default. Per-request memory: routing context, adapter call, response, then discard. Trace events are persisted via the existing event bus → SQLite path; that’s the only durable record.
Optional opt-in (deferred to v1.1): per-key request logging to the existing messages / sessions tables, gated by a gateway_keys.log_messages flag. The default is off because (a) buyers care about cost dashboards, not transcript replay, and (b) gateway traffic is often subject to data-residency constraints that the buyer must opt into explicitly.
Error class taxonomy aligns with provider-adapter-contract.md §6.1. The gateway translates canonical error classes to inbound-shape error envelopes (see app.py::_openai_error_from_adapter and _anthropic_error_from_adapter):
| Canonical class | OpenAI-shape outbound | Anthropic-shape outbound |
|---|---|---|
auth |
401 invalid_request_error / invalid_api_key |
401 authentication_error |
rate_limit |
429 rate_limit_error / rate_limit_exceeded |
429 rate_limit_error |
context_overflow |
400 invalid_request_error / context_length_exceeded |
400 invalid_request_error |
invalid_request |
400 invalid_request_error |
400 invalid_request_error |
network |
502 api_error |
502 api_error |
server_error |
503 api_error |
503 overloaded_error |
| other / classify-internal_ | 500 api_error |
500 api_error |
RoutingFailedError (no eligible model) |
503 api_error / routing_failed |
503 overloaded_error |
ModelNotAllowedError (key’s allowlist) |
403 invalid_request_error |
403 permission_error |
ClientDisconnected (TCP RST) |
(no body — connection closed) | (no body) |
apps/gateway/ ships its own console-script entry (metis gateway, mounted via the unified metis console-script in metis-cli). One binary, single-process uvicorn. Docker image to follow.metis gateway issue-key --name "<display>" --workspace "<path>" [--allow-models ...] [--daily-cap-usd ...]. Prints the key once; only the SHA-256 hash is stored. Keystore file mode 0o600./healthz returns 200 when uvicorn is up; the response does not imply downstream providers are reachable. Liveness vs. readiness split deferred.analytics-api.md is the dashboard. group_by=gateway_key on /analytics/cost and the dedicated /analytics/by_key endpoint (§4.8) split cost by key + inbound shape using the additive payload fields described in §6.These are not features of the v1 gateway and not part of this spec’s contract:
daily_cap_usd / monthly_cap_usd (§6.4); request-rate limits and IP-bucket throttles are out of scope.v1.0 issued keys but had no online revocation or rotation — the
operator’s only options were “delete the JSON entry and restart” or
“leave a leaked key alive.” Wave 10 lands three online operations on
top of the existing keystore. All three are atomic writes
(write-temp-then-rename), all three emit pseudonymous audit events,
and all three are loopback-only CLI operations — there is no
HTTP-level admin surface (operators reach the keystore through the
same shell that ran issue-key).
GatewayKey gains three lifecycle fields on top of the v1 record:
| Field | Type | Notes |
|---|---|---|
status |
Literal["active", "revoked"] (default "active") |
Missing on pre-Wave-10 records — the loader fills "active". A revoked key is still loaded into the keystore so auth can return the documented key_revoked body with the stable key_id. |
revoked_at |
datetime \| None |
Set when status="revoked". UTC, ISO-8601 in the JSON file. Required when status="revoked" — the loader rejects a revoked record without a timestamp. |
grace_period_until |
datetime \| None |
Set by rotate-key on the predecessor. While the key is active and now < grace_period_until, auth accepts it; past that boundary is_active returns False (auth read-only — the next admin sweep persists the transition). |
A created_at ISO-8601 timestamp is now stamped on every new record so
list-keys can sort + display issuance order. Pre-Wave-10 records read
back with created_at=None, which list-keys renders as -.
metis gateway revoke-key <key_id>Marks the key revoked. Idempotent against an already-revoked key
(returns the existing revoked_at; emits no second audit event).
Subsequent gateway requests authenticating with that key return:
HTTP/1.1 401 Unauthorized
Content-Type: application/json
{
"error": {
"code": "key_revoked",
"key_id": "gk_01HXYZ...",
"revoked_at": "2026-05-15T14:22:10+00:00",
"type": "invalid_request_error" | "authentication_error",
"message": "gateway key gk_01HXYZ... has been revoked"
}
}
type carries the shape-specific discriminator (OpenAI vs Anthropic)
so each SDK’s error parser still recognizes the envelope; the body is
otherwise identical between inbound shapes.
metis gateway rotate-key <key_id> [--grace-period <duration>]Mints a successor key that inherits the predecessor’s metadata
(workspace_path, user_id, team_id, allowed_models,
daily_cap_usd, monthly_cap_usd) and stamps grace_period_until on
the predecessor. Default grace period: 24 hours.
During the grace window, both predecessor and successor authenticate;
llm.call_completed / turn.completed events stamp the
gateway_key_id actually used so operators see the migration land in
/analytics/by_key. Past the boundary, the predecessor reads as
revoked at auth time (is_active(now=...) returns False) and the next
admin sweep — any subsequent admin op against the keystore, or an
explicit sweep_expired_grace_periods() call — persists the
active → revoked transition and emits a paired gateway.key_revoked
with reason="grace_period_expired".
The new plaintext token is printed once and is recoverable only from
the client team’s secrets broker after that — the keystore only
persists the SHA-256 hash, same as issue-key.
--grace-period accepts forms like 30m, 24h, 7d, 2w. Zero or
negative durations are rejected (use revoke-key for an immediate
cutoff with no successor).
metis gateway list-keys [--format text|json]Returns every key in the keystore — including revoked ones — with
status, identity, caps, and timestamps. status is the on-disk value;
effective_status applies the same is_active rule the auth path
uses, so an active key whose grace window has lapsed reads as
revoked even before the next sweep persists the transition.
The JSON output is the keystore admin contract for buyer tooling
(SOX-style “list every credential and when it was issued / revoked”);
the text output is the terminal-friendly summary. Both are
non-mutating — list-keys never writes to the keystore.
Three new event types in the catalog (pseudonymous floor; see
event-bus-and-trace-catalog.md §6.13):
gateway.key_issued — emitted by metis gateway issue-key after the
keystore write succeeds. Carries the resolved (gateway_key_id, name,
workspace_path, user_id, team_id, allowed_models, daily_cap_usd,
monthly_cap_usd, issued_at) so dashboards can correlate cost rows
back to issuance.gateway.key_revoked — emitted on explicit revoke-key (with
reason="admin_revoke") or on grace-period sweep
(reason="grace_period_expired"). The reason enum is the third
value "rotated" (reserved for a future “fail-fast revoke on
rotate” variant; not emitted in v1).gateway.key_rotated — emitted by rotate-key. Carries both
old_gateway_key_id and new_gateway_key_id so the trace traces the
migration; also stamps the inherited workspace_path, user_id,
and team_id for the dashboard’s per-identity rollup.Audit emission is best-effort — failures don’t abort the keystore mutation. The keystore file is the source of truth; the audit event is a follow-on for operators.
POST /admin/keys/revoke endpoint. Adding one requires the
production-bind hardening (auth/rate-limiting/audit) listed in §12
below.rotate-key from cron.revoke-key overwrites the in-memory key
to status="revoked"; if you need a longer audit trail than the
trace-DB events provide, snapshot the keystore before each ops
action.Pre-Wave-14 the only way for a buyer to get a gateway key was for an
operator to run metis gateway issue-key on the host. That works for
in-VPC deployments where buyer-owned operators provision their own
accounts, but it’s friction for a SaaS-style on-ramp where the buyer
arrives at a hosted gateway, wants to create an account themselves,
get a key, and point their client at it — without filing a ticket.
Wave 14 adds a thin self-serve surface on top of the existing keystore.
Accounts are a Metis-issued record (no SSO in v1; SSO remains the v2
upgrade from multi-user.md §8.1); the first key on
an account is issued the moment the account verifies its email. The
account session token is then enough to manage subsequent keys via
the same shape metis gateway issue-key / revoke-key already use,
without ever touching the operator’s shell.
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/signup |
POST | none | Create a pending account; mint + email a verification magic link. |
/signup/verify |
POST | magic-link token (one-shot) | Mark account verified, issue first gateway key, return key plaintext + session token. |
/account/keys |
GET | session bearer | List the keys this account owns (joins on the keystore’s key_id). |
/account/keys |
POST | session bearer | Issue another key for this account, scoped to the account’s workspace. |
/account/keys/{key_id} |
DELETE | session bearer | Revoke a key the account owns. Wraps the existing keystore_admin.revoke_key. |
All five mount only when the gateway is launched with
SignupConfig(enabled=True) (CLI: --enable-signup; helm: signup.enabled).
The endpoints return 404 when signup is off so in-VPC deployments don’t
have to firewall a public surface.
accounts.json lives at ~/.metis/gateway/accounts.json (mode 0o600,
mirroring the keystore). Per-record shape:
class Account(frozen=True):
account_id: str # "acc_<ulid>"
email: str # plaintext; only place it lives
email_sha256: str # derived; used for dedup / future SSO bridge
workspace_path: str # synthetic, "/metis/accounts/<account_id>/<name>"
user_id: str | None # echoed onto every issued key
team_id: str | None # same
created_at: datetime
verified_at: datetime | None
key_ids: tuple[str, ...] # foreign keys into keys.json
The trace store never carries plaintext email (matches
multi-user.md §3.3). The signup payload carries email
for the magic-link sender; the key it issues references the account via
user_id only.
The magic-link “email” is logged to stdout in Wave 14:
[magic-link signup] email=alice@example.com url=<dashboard>/signup/verify?token=mlk_... expires_in_seconds=1800
Wave 15 swaps in a real transport (SES / SendGrid pluggable). Logging
to stdout is a development affordance — the operator MUST swap in real
email before exposing /signup on the open internet. The logged URL
isn’t a security hole on a private network (only the operator can read
stdout), but it would be one on a public host without the swap.
Tokens are 32-byte URL-safe randoms, SHA-256-hashed before persisting.
Defaults: magic-link TTL 30 minutes, session TTL 24 hours. Magic links
are single-use; re-posting /signup against a still-pending account
re-mints (so a user who lost the first email isn’t stuck).
Verifying the magic link returns a sess_<random> token. The token is
SHA-256-hashed before persisting (same shape as gateway keys). The
account uses this token in Authorization: Bearer sess_… to call
/account/keys. Sessions expire after session_ttl and are deleted
on the next access; there is no refresh path in v1 (request another
magic link to extend).
/signup. The existing per-IP rate limiter
(gateway-hardening.md §3) applies to signup
endpoints too; operators exposing the surface publicly should enable
it.accounts.json is a local-FS file; SaaS
deployments running multi-pod must mount it on durable shared
storage (same RWX caveat as the trace DB; helm signup.accountsPath
is the mount point).metis gateway rotate-key remains
CLI-only; /account/keys POST is mint-new, DELETE is revoke. A
rotation surface lands when there’s evidence buyers want one.dashboard_url is a placeholder
field on the verification response — the SPA itself ships later.Billing is opt-in and only mounts when GatewayConfig.billing is set
(CLI: --enable-billing; helm: billing.enabled). It also requires
signup/account sessions because every billing mutation is scoped to a
verified account. Deployments with billing disabled return 404 for the
billing namespace and preserve the pre-billing gateway surface.
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/account/billing |
GET | session bearer | Return the current subscription summary, effective billing tier, payment state, and free-tier cap. |
/account/billing/portal |
GET | session bearer | Create a one-shot Stripe Customer Portal link for payment method, invoice, and cancellation self-service. |
/account/billing/plan |
POST | session bearer | Switch plan with {"plan": "free" | "pro" | "enterprise"} plus optional seats and payment_method_id. |
/account/billing/subscribe |
POST | session bearer | Back-compat Pro-subscription creation endpoint from Wave 15. |
/account/billing/payment-method |
POST | session bearer | Attach or replace a Stripe payment method by id. |
/account/billing/cancel |
POST | session bearer | Cancel the subscription, at period end by default. |
/account/billing/pause |
POST | session bearer | Pause Stripe collection. |
/account/billing/resume |
POST | session bearer | Resume Stripe collection. |
/webhooks/stripe |
POST | Stripe signature | Handle subscription update/delete and invoice paid/payment-failed events idempotently. |
free_monthly_cap_usd = 5.00
by default).POST /account/billing/plan
creates a subscription if one does not exist, or changes seats on the
existing subscription if it does.Plan changes emit the existing billing.subscription_created,
billing.subscription_updated, or billing.subscription_canceled
audit events. No new event type is required for Wave 16.
Stripe invoice.payment_failed puts the account in a 7-day grace
window (failed_payment_grace_days, default 7). During grace, the
effective customer tier remains paid so buyers can fix payment without
interrupting dev workflows. After grace expires, the billing service
marks the subscription as frozen, sets the effective tier to Free, and
lets the normal tier-cap path enforce the Free monthly cap. A later
invoice.payment_succeeded restores the paid tier and clears the
frozen marker.
The v1 email notification is intentionally stubbed: operators should send the notification from the billing operator runbook until a real transactional-email adapter is installed.
--ignore-inbound-model flag (§5.3 open question). Lets routing fall through to rule / workspace / global slots even when the client’s model is set, opting into transparent cost-optimization mode./v1/models listing. OpenAI clients expect this surface; deferred until a Cursor or Continue user asks.gateway_key_id / inbound_shape on TurnCompleted. Currently dict-envelope-stamped; promoting them to typed fields keeps the catalog discipline (event-bus-and-trace-catalog.md) honest.canonical-message-format.md — Message, ContentBlock, persistence schema.provider-adapter-contract.md — adapter interface, retry, error classes.routing-engine.md — the 7-slot chain consumed in §5.event-bus-and-trace-catalog.md — additive payload fields on LLMCallCompleted.analytics-api.md §4.1, §4.8 — group_by=gateway_key on /analytics/cost and the /analytics/by_key rollup.deployment-shape.md — the gateway/agent/hybrid framing this spec is paired with.apps/gateway/src/metis_gateway/ — implementation.