metis

Gateway deployment

How to run the Metis transparent gateway from a container image. This is the server-side counterpart to the gateway client quickstart — point your existing OpenAI / Anthropic SDK clients at this URL, let Metis route and meter, and read the cost attribution out of the trace DB.

The gateway is per-request stateless (no session, no tools, no memory — see specs/gateway.md §2). One request = one HTTP call routed through the canonical IR to a provider adapter and back. The container packages that loop into a slim runtime image you can deploy on a laptop or a single VM.

Loopback-only bind, by design. The gateway forces host=127.0.0.1 inside the container (v1 safety guarantee per specs/gateway.md §3.2 and specs/server-api.md §3.1). Reach it from outside the container with network_mode: host (laptop / single-VM) or with a TLS terminator that shares the gateway’s network namespace (production). Standard docker run -p port-mapping does not work against a loopback bind. See Production checklist.


5-minute quickstart

Requires Docker (Linux) or Docker Desktop ≥4.29 (macOS / Windows). On macOS / Windows you must also enable host networking in Docker Desktop → Settings → Resources → Network → “Enable host networking” — it’s a beta toggle introduced in 4.29 and is off by default. Without it, the host can reach the gateway’s loopback only from inside the container (via docker exec), not from the macOS / Windows shell. On Linux, network_mode: host works out of the box.

# 1. Configure your provider key(s). At least one of ANTHROPIC_API_KEY,
#    OPENAI_API_KEY, or OPENROUTER_API_KEY MUST be set — the gateway
#    refuses to start without one (it cannot reach any provider).
cp .env.example .env
$EDITOR .env   # set ANTHROPIC_API_KEY (and/or OPENAI_API_KEY / OPENROUTER_API_KEY)

# 2. Build the image and issue your first gateway key. The keystore must
#    exist before `metis gateway` (server) starts — issue-key creates it.
docker compose build gateway
mkdir -p .metis-gateway/keys .metis-gateway/data
docker compose run --rm gateway issue-key \
    --name "my-client" \
    --workspace /workspace
# → prints `token: gw_…` once. Save it now — only the hash is persisted.

# 3. Start the gateway.
docker compose up -d

# 4. Verify (Linux, or macOS/Windows with host networking enabled).
curl http://127.0.0.1:8422/healthz
# → {"status":"ok","uptime_seconds":…}
#
# If you're on macOS/Windows and host networking is not enabled:
docker compose exec gateway curl --silent http://127.0.0.1:8422/healthz

Send a real LLM call (requires ANTHROPIC_API_KEY in .env):

curl http://127.0.0.1:8422/v1/messages \
    -H "x-api-key: gw_…paste_the_token_from_step_2…" \
    -H "anthropic-version: 2023-06-01" \
    -H "content-type: application/json" \
    -d '{
      "model": "claude-haiku-4-5",
      "max_tokens": 128,
      "messages": [{"role": "user", "content": "say hi"}]
    }'

…and verify the spend was attributed to the key (until /analytics/cost?group_by=gateway_key ships — tracked in specs/gateway.md §V, the data is already on the trace DB):

docker compose exec gateway sqlite3 /var/lib/metis/metis.db \
  "SELECT gateway_key_id, inbound_shape, ROUND(SUM(cost_usd),6) AS cost
   FROM events
   WHERE type = 'llm.call_completed'
   GROUP BY gateway_key_id, inbound_shape;"

Reference

Image

Built from infra/gateway/Dockerfile. Multi-stage: builder runs uv sync --frozen --no-dev against the workspace lockfile; runtime is python:3.13-slim plus the resolved venv. Non-root user (metis). The image installs curl only as a healthcheck dependency.

Tag Contents
metis-gateway:latest The default tag produced by docker compose build gateway (or docker build .).

Environment variables

Variable Default What it does
ANTHROPIC_API_KEY (unset) Provider key used by the Anthropic adapter for outbound calls.
OPENAI_API_KEY (unset) Provider key used by the OpenAI adapter.
OPENROUTER_API_KEY (unset) Provider key used by the OpenRouter adapter.
METIS_GATEWAY_HOST 127.0.0.1 Bind host. Non-loopback values are rewritten to 127.0.0.1 (v1 safety guarantee).
METIS_GATEWAY_PORT 8422 Bind port. Matches metis gateway --port default.
METIS_GATEWAY_KEYSTORE /etc/metis/keys.json Path inside the container to the gateway keystore. Mount a host volume here to persist keys.
METIS_GATEWAY_DB_PATH /var/lib/metis/metis.db Path inside the container to the SQLite trace DB. Mount a host volume here to persist traces.
METIS_GATEWAY_GLOBAL_DEFAULT anthropic:claude-sonnet-4-6 Model used when routing finds no other slot win (clients passing model always win slot 1).

Port

Port Direction Purpose
8422 inbound OpenAI-shape (POST /v1/chat/completions), Anthropic-shape (POST /v1/messages), GET /healthz.

Volumes

Container path Purpose
/workspace The workspace the issued keys are scoped to. The gateway is read-only against it.
/etc/metis/keys.json Keystore. SHA-256 hashes only; plaintext tokens are never persisted. Created on first issue-key.
/var/lib/metis/metis.db Trace DB. Holds route.decided / llm.call_completed / turn.completed events with gateway_key_id + inbound_shape stamped on each LLM/turn payload.

Key management

Keys are issued by running the image with the issue-key first arg, which the entrypoint dispatches to metis gateway issue-key:

# One-shot via compose (uses the keystore volume from docker-compose.yml).
docker compose run --rm gateway issue-key \
    --name "ci-bot" \
    --workspace /workspace \
    --allow-model anthropic:claude-haiku-4-5 \
    --daily-cap-usd 5.00

The plaintext gw_… token is printed once and cannot be recovered. The keystore stores {key_id, secret_hash, name, workspace_path, allowed_models?, daily_cap_usd?, monthly_cap_usd?, user_id?, team_id?, status, revoked_at?, grace_period_until?, created_at} keyed on the SHA-256 of the token.

Rotating, revoking, and listing keys

Wave 10 (gateway.md §11) added online lifecycle ops. All three are atomic writes (write-temp-then-rename) so a running gateway never observes a partial keystore, and all three emit audit events (gateway.key_issued / gateway.key_revoked / gateway.key_rotated) to the trace DB when one is reachable.

Immediate revoke (e.g. “this key may have been exposed”):

docker compose run --rm gateway revoke-key gk_01HXYZ...
# → revoked: gk_01HXYZ...
# → revoked_at: 2026-05-15T14:22:10+00:00

Subsequent requests carrying the revoked bearer return 401 with body {"error": {"code": "key_revoked", "key_id": "gk_…", "revoked_at": "…"}} — see gateway.md §11. Restart is not required; auth re-reads the keystore from in-memory state at the next request, and the on-disk revocation survives a restart.

Rotation with grace period (e.g. “Alice left the team”):

docker compose run --rm gateway rotate-key gk_01HXYZ... --grace-period 24h
# → old_key_id: gk_01HXYZ...
# → new_key_id: gk_01HXZA...
# → new_token:  gw_01HXZA...       (printed once, copy it now)
# → grace_period_until: 2026-05-16T14:22:10+00:00

The successor inherits the predecessor’s workspace_path, user_id, team_id, allowed_models, daily_cap_usd, and monthly_cap_usd. Both keys authenticate during the grace window so the client team can roll the new token without downtime; trace events stamp the gateway_key_id actually used so operators can watch the migration land in /analytics/by_key. After the grace boundary, the predecessor is treated as revoked at auth time, and the next admin sweep (metis gateway list-keys or revoke-key against any key) persists the active → revoked transition + emits a gateway.key_revoked event with reason="grace_period_expired".

Grace-period forms: 30m, 24h, 7d, 2w. Default: 24h. Use revoke-key for an immediate cutoff (no successor).

Listing:

docker compose run --rm gateway list-keys
# KEY_ID                           STATUS   USER         TEAM         ...
# gk_01HXYZ...                     active   alice        eng          ...
# gk_01HXZA...                     active   alice        eng          ...
#
# Or machine-readable:
docker compose run --rm gateway list-keys --format json | jq .

list-keys shows both status (on-disk) and effective_status (the auth-time view; an active key whose grace has lapsed reads as revoked even before the next sweep persists it).

Logs

The gateway logs to stdout/stderr via uvicorn. Tail with:

docker compose logs -f gateway

There is no log rotation in v1; for long-running deployments, configure the container runtime’s logging driver (--log-driver json-file --log-opt max-size=10m).

Observability hooks

Surface What it reports
GET /healthz Liveness + uptime. Used by the container healthcheck.
Trace DB at /var/lib/metis/metis.db Full event stream: route.decided, llm.call_started, llm.call_completed, turn.completed. Tagged with gateway_key_id and inbound_shape.
/analytics/cost Per-model / per-time-window cost roll-up via the metis-server analytics surface (separate app — not exposed by the gateway image; spin up metis serve against the same DB to read it). Accepts ?gateway_key=<id> to filter to one tenant.
/analytics/cost?group_by=gateway_key Per-key roll-up dimension on /analytics/cost.
/analytics/by_key Per-key cost / call / inbound-shape roll-up (analytics-api.md §4.8) — the dedicated buyer surface. Accepts ?gateway_key=<id> for exact-match filter.
metis serve dashboard Gateway keys tab Visual surface over /analytics/by_key — sortable per-key table, top-spender callout, click-through drill-down into the Cost view.

Production checklist

The gateway image as shipped is appropriate for a single-tenant developer laptop or a single internal VM. Production deployment is operator responsibility; the spec deliberately stops at “loopback-bound, drop a TLS terminator in front” so that authentication / rate-limiting / audit remain TLS-terminator concerns rather than gateway-app concerns.

TLS termination

Put Caddy or nginx in front of the gateway. The cleanest pattern in Docker is a sidecar that shares the gateway’s network namespace so both processes see 127.0.0.1:

# docker-compose.prod.yml (sketch — not shipped, write to taste)
services:
  gateway:
    extends:
      file: docker-compose.yml
      service: gateway
    # Drop network_mode: host so the gateway is reachable only via the
    # sidecar. The gateway still binds 127.0.0.1 inside its namespace.
    network_mode: ""

  caddy:
    image: caddy:2
    network_mode: "service:gateway"   # share the gateway's namespace
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy-data:/data
      - caddy-config:/config
    ports:
      - "443:443"

With a minimal Caddyfile:

gateway.example.com {
    reverse_proxy 127.0.0.1:8422
}

Caddy terminates TLS on 0.0.0.0:443 and forwards plaintext to the gateway’s loopback inside the shared namespace.

Keystore rotation

Wave 10 (gateway.md §11) replaces the v1 manual procedure with metis gateway rotate-key:

docker compose run --rm gateway rotate-key gk_01HXYZ... --grace-period 24h

The predecessor stays active for the grace window so the client team can roll the new token without downtime; after the boundary it auto-revokes on the next admin sweep. See Key management above for the full recipe, including revoke-key for immediate cutoffs and list-keys for the post-rotation reconciliation view.

No TTL / scheduled rotation in v1.1 either — for scheduled rotation, run rotate-key from cron and pipe the new token to your secrets broker the same way you handled issue-key distribution.

Trace DB size management

The trace DB grows linearly with traffic. SQLite WAL mode means writes are append-mostly. Two knobs:

  1. Periodically VACUUM to reclaim space from deleted/checkpointed rows: docker compose exec gateway sqlite3 /var/lib/metis/metis.db 'VACUUM;'.
  2. Prune old events before they bloat the DB:

    DELETE FROM events WHERE timestamp < '2026-04-01T00:00:00Z';
    

    Run inside a BEGIN; … COMMIT; if you want it atomic with a VACUUM.

The trace DB is the source of truth for cost attribution — delete only after you’ve rolled the data into whatever billing system you actually charge from.

Backup & restore

The trace DB lives on a single SQLite file (default ~/.metis/metis.db; in the helm chart, on the gateway’s PVC). For a risky upgrade or a planned migration, take a snapshot before the change and keep the restore command on hand.

The shipped recipe uses metis backup (which calls SQLite’s VACUUM INTO) rather than cp: WAL mode keeps a -wal file alongside the live DB until checkpoint, so a naive cp metis.db /backup/ will miss in-flight events. VACUUM INTO is the SQLite-blessed hot-backup path — atomic, WAL-safe, single-file output, and the source DB does not need to be closed first.

Take a backup.

# Defaults to source = ~/.metis/metis.db. Backup is a single file.
metis backup /backup/metis.$(date -u +%Y%m%dT%H%M%SZ).db

# Explicit source path (e.g., inside the gateway container against
# the PVC mount).
docker compose exec gateway \
    metis backup /backup/metis.db --db-path /var/lib/metis/metis.db

Output is a deterministic block: source path, dest path, byte count, schema version, event count, oldest/newest event timestamp. Save it alongside the backup as a paper trail.

Restore a backup. Schema-version checked; refuses to clobber an existing live DB unless --force is passed.

# 1. Stop the writer first (the gateway / serve process). VACUUM INTO
#    snapshots are crash-consistent, but restoring under an active
#    writer is not — pause the writer, restore, then start it.
docker compose stop gateway

# 2. Restore. --force lets you replace a corrupt or downgraded DB.
metis restore /backup/metis.20260514T030000Z.db \
    --db-path ~/.metis/metis.db --force

# 3. Start the writer.
docker compose start gateway

If the backup’s schema version doesn’t match the running binary, restore refuses with a diagnostic naming both versions. Downgrade the binary to one that wrote that schema, run any forward-migration script (none exist in v1 — there’s only one schema version so far), then re-restore.

Rotation policy. A sensible default for buyers without a separate backup system:

Example crontab (run inside the container, or on the host against the mounted PVC path):

# Daily snapshot at 03:00; keep last 7.
0 3 * * * metis backup /backup/daily/metis.$(date -u +\%Y\%m\%d).db \
    --db-path /var/lib/metis/metis.db \
    && find /backup/daily -name 'metis.*.db' -mtime +7 -delete

Restore drill. Test the restore path on a non-production workspace before you need it for real: take a backup, point a fresh metis serve at a different --db-path, restore the backup there, and confirm /analytics/cost returns the same numbers as the live DB at backup time.

Helm + PVC. If the PVC’s StorageClass supports volume snapshots (EBS, GCE PD, RBD, Longhorn, etc.), those compose cleanly with metis backup for application-consistent point-in-time recovery:

  1. Issue kubectl exec deploy/<release>-gateway -- metis backup /var/lib/metis/snapshots/metis-$(date -u +%s).db --db-path /var/lib/metis/metis.db to land a crash-consistent single-file backup on the PVC.
  2. Trigger a VolumeSnapshot against the PVC. The snapshot now contains both the live DB (which may have new writes since step 1) and the application-consistent backup file (which is a frozen crash-consistent image of the DB at step-1 time).
  3. To restore, mount the snapshot’s volume on a sibling pod and metis restore /var/lib/metis/snapshots/metis-…db --db-path … — you get the application-consistent file even if the live DB on the restored volume was mid-flight. Volume snapshots alone don’t give you this property; they’re crash-consistent at the filesystem level but not necessarily at the SQLite-application level.

Cost attribution conventions

Every llm.call_completed and turn.completed event carries the gateway_key_id that authorized the request and the inbound_shape (openai or anthropic) the client used. Recommended tagging:

Attribution dimension Where it lives
Per-tenant / per-customer Issue one key per tenant. name on the key is your free-text label.
Per-environment (dev/staging/prod) Issue separate keys; use the name to encode the env.
Per-application-feature Issue separate keys per feature surface. Aggregating across features is a SQL GROUP BY.

These dimensions roll up through /analytics/by_key (one row per key, with a per-inbound-shape sub-array) and /analytics/cost?group_by=gateway_key (plain cost rows keyed by gateway_key_id). For a buyer-facing visual view, point a browser at the metis serve dashboard’s Spend by identity tab — same DB, same numbers, no extra wiring. The tab ships three rollups in one place: Per-team (/analytics/by_team, with an expand-on-click per-user breakdown), Per-user (/analytics/cost?group_by=user), and Per-key (the original Wave-6 view, /analytics/by_key). Click-through filters the Cost and Activity views to that identity via ?team=<id> / ?user=<id> / ?gateway_key=<id> — letting an operator monitor spend per tenant (team), per developer (user), or per credential (key) with the same chrome and no separate report tooling.

Per-team and per-user attribution requires --user / --team on issue-key (multi-user.md §4.2); pre-multi-user keys roll up under the untagged bucket in those tiles. The per-key tile works on every key regardless of whether it carries identity tags.

Non-loopback bind (deferred)

The gateway will refuse --host 0.0.0.0 in v1 — the value is silently rewritten to 127.0.0.1 with a warning log. This is the documented v1 safety posture (auth / rate-limiting / audit hardening lands before the gateway accepts non-loopback). If you need an externally-reachable listener, terminate TLS in front per the section above; do not patch the bind check.


Smoke test recipe

For a client buyer to verify the gateway end-to-end:

# 1. Build + run.
docker compose up -d gateway
sleep 2
curl --fail http://127.0.0.1:8422/healthz

# 2. Issue a key.
TOKEN=$(docker compose run --rm gateway issue-key \
    --name "smoke" --workspace /workspace \
    2>/dev/null | awk '/^token:/ {print $2}')

# 3. Hit the OpenAI shape.
curl http://127.0.0.1:8422/v1/chat/completions \
    -H "Authorization: Bearer $TOKEN" \
    -H "content-type: application/json" \
    -d '{
      "model": "claude-haiku-4-5",
      "messages": [{"role": "user", "content": "respond with the word OK"}],
      "max_tokens": 16
    }'

# 4. Confirm the spend was attributed to the key.
docker compose exec gateway sqlite3 /var/lib/metis/metis.db \
    "SELECT key_id, COUNT(*) AS calls, ROUND(SUM(cost_usd),6) AS cost_usd
     FROM (
       SELECT json_extract(payload, '\$.gateway_key_id') AS key_id,
              json_extract(payload, '\$.usage.cost_usd') AS cost_usd
       FROM events
       WHERE type = 'llm.call_completed'
     )
     GROUP BY key_id;"

Step 3 should return an OpenAI-shape chat.completion body; step 4 should report calls=1 against the key issued in step 2.


Kubernetes via helm

The Docker quickstart above is single-node by design — for buyers running in-cluster, the chart at infra/gateway/helm/ packages the same image into a deployable bundle. The chart is single-tenant v1 (one workspace per gateway key) and the same posture as the Docker shape: loopback bind inside the pod, TLS termination is the buyer’s responsibility. See the Production-readiness audit below before any non-laptop deployment.

What the chart ships

Resource Default
Deployment 1 replica, 250m CPU / 256Mi memory requested, RollingUpdate (maxSurge 1, maxUnavailable 0).
Service ClusterIP on 8422, targets the proxy sidecar’s http port.
Deployment.proxy Sidecar (alpine/socat:1.8.0.0) listens on 0.0.0.0:8423 inside the pod and forwards to 127.0.0.1:8422. This bridges the gateway’s loopback bind (v1 safety guarantee) so the Service can reach it.
Ingress OFF. Enable explicitly and provide a TLS cert (cert-manager / cloud LB / sealed Secret).
Secret (providers) Chart-managed by default with inline keys, OR provider.existingSecret to consume one you manage. The chart fails install if no provider key is provided either way (the gateway refuses to start without one).
ConfigMap (keystore) Seeded empty { "keys": [] }. The gateway rejects an empty keystore at startup, so the seed is for helm template rendering only — for a real install, issue at least one key out-of-band and pass it via keystore.existingSecret (recipe in Quickstart).
PersistentVolumeClaim 1Gi RWO for the trace DB. The cluster default StorageClass is used unless persistence.storageClass is set.
HorizontalPodAutoscaler OFF. CPU-based scaling 1→3 when enabled (see caveat below on shared trace DB).
PodDisruptionBudget minAvailable: 1 so cluster autoscalers / upgrade tools wait for a replacement before evicting.
NetworkPolicy Deny-by-default. Ingress from any in-namespace pod by default; egress to TCP 443 to any IP (provider APIs cannot be matched by NetworkPolicy DNS-wise) and cluster DNS.
ServiceAccount Chart-managed, no extra RBAC. Reuse an existing one via serviceAccount.name + serviceAccount.create=false.

Quickstart

The gateway refuses to start with an empty keys.json (auth.py rejects {"keys": []}), so the working order is issue a key out-of-band, then install. The chart’s seed ConfigMap is fine for helm template rendering but is not a valid install-time keystore.

# 1. Build + push the gateway image to a registry your cluster can pull.
docker build -t your-registry.example.com/metis-gateway:0.1.0 \
    -f infra/gateway/Dockerfile .
docker push your-registry.example.com/metis-gateway:0.1.0

# 2. Create a namespace.
kubectl create namespace metis-gateway

# 3. Issue your first gateway key BEFORE the install. Save the printed
#    token — only the SHA-256 hash is persisted. Either run the CLI
#    locally against a uv workspace…
mkdir -p ./.metis-gateway
uv run metis gateway issue-key \
    --keystore ./.metis-gateway/keys.json \
    --name "my-client" --workspace /workspace
# → prints `token: gw_…` once.

#    …or use the gateway image's issue-key subcommand if you don't have
#    uv installed:
# docker run --rm -v "$PWD/.metis-gateway:/etc/metis" \
#     your-registry.example.com/metis-gateway:0.1.0 issue-key \
#         --name "my-client" --workspace /workspace

# 4. Wrap the keystore in a Secret. (A ConfigMap also works, but Secret
#    matches how keys.json is treated in production paths.)
kubectl -n metis-gateway create secret generic metis-gateway-keystore \
    --from-file=keys.json=./.metis-gateway/keys.json

# 5. Install the chart. Pin a real image tag, NOT `latest` (the chart
#    ships with `latest` as a placeholder).
helm install metis-gateway ./infra/gateway/helm/ \
    --namespace metis-gateway \
    --set image.repository=your-registry.example.com/metis-gateway \
    --set image.tag=0.1.0 \
    --set provider.anthropicApiKey="${ANTHROPIC_API_KEY}" \
    --set keystore.existingSecret=metis-gateway-keystore

# 6. Wait for the pod to come up.
kubectl -n metis-gateway wait deploy/metis-gateway --for=condition=Available --timeout=120s

# 7. Smoke-test over a port-forward.
kubectl -n metis-gateway port-forward svc/metis-gateway 8422:8422 &
curl http://127.0.0.1:8422/healthz

To rotate or add keys later, regenerate keys.json with metis gateway issue-key, recreate the Secret (kubectl create secret ... --dry-run=client -o yaml | kubectl apply -f -), and kubectl rollout restart deploy/metis-gateway. See Keystore rotation without a restart.

Common values.yaml overrides

Private registry with pull secrets:

image:
  repository: ghcr.io/your-org/metis-gateway
  tag: "0.1.0"
  pullSecrets:
    - name: ghcr-pull-secret

TLS via nginx-ingress + cert-manager:

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: 10m
  hosts:
    - host: gateway.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: metis-gateway-tls
      hosts:
        - gateway.example.com

Provider keys from External Secrets Operator:

provider:
  existingSecret: metis-gateway-providers
# (out-of-band) create an ExternalSecret that materializes a Secret
# named metis-gateway-providers with keys ANTHROPIC_API_KEY / etc.

Keystore from a sealed-Secret bundle:

keystore:
  existingSecret: metis-gateway-keystore
# Secret must have a key named "keys.json" whose value is the keystore JSON.
# kubectl create secret generic metis-gateway-keystore \
#     --from-file=keys.json=./keys.json

LoadBalancer (only with TLS in front):

service:
  type: LoadBalancer
  annotations:
    # AWS NLB with ACM cert + TLS termination at LB:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:...
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "8422"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp

Tightening NetworkPolicy ingress to a specific client namespace:

networkPolicy:
  ingressFromSelector:
    matchLabels:
      app.kubernetes.io/name: my-client-app

Validation

The chart was validated with helm 4.2.0:

helm lint infra/gateway/helm/
# → 1 chart(s) linted, 0 chart(s) failed

helm template test infra/gateway/helm/ \
    --set provider.anthropicApiKey=sk-ant-stub
# → 8 manifests rendered (NetworkPolicy / PDB / ServiceAccount / Secret /
#   ConfigMap / PVC / Service / Deployment)

helm template test infra/gateway/helm/ \
    --set provider.anthropicApiKey=sk-ant-stub \
    --set keystore.existingSecret=metis-gateway-keystore
# → 7 manifests rendered (the seed ConfigMap drops out when a Secret
#   keystore is supplied)

Do this before merging chart changes. End-to-end install validation is captured in First production smoke below.

First production smoke (kind, 2026-05-15)

The chart was deployed end-to-end against a kind 0.31.0 cluster (kindest/node:v1.35.0) on macOS / Docker Desktop 29.2.0, using helm 4.2.0 and kubectl v1.34.1. Cluster spinup → first 200 OK on /healthz took ~3 minutes after the first image build (Docker layer cache cold). Full transcript:

# 1. Create the cluster + load the locally-built image.
kind create cluster --name metis-gateway-smoke --wait 2m
docker build -t metis-gateway:dev -f infra/gateway/Dockerfile .
kind load docker-image metis-gateway:dev --name metis-gateway-smoke

# 2. Issue a key out-of-band, wrap it in a Secret.
kubectl create namespace metis-gateway
mkdir -p /tmp/metis-gateway-smoke
uv run metis gateway issue-key \
    --keystore /tmp/metis-gateway-smoke/keys.json \
    --name "smoke-client" --workspace /workspace \
  | grep -E "^(key_id|token):" > /tmp/metis-gateway-smoke/issue.out
TOKEN=$(awk '/^token:/ {print $2}' /tmp/metis-gateway-smoke/issue.out)
kubectl -n metis-gateway create secret generic metis-gateway-keystore \
    --from-file=keys.json=/tmp/metis-gateway-smoke/keys.json

# 3. Install with dev overrides.
helm install metis-gateway ./infra/gateway/helm/ \
    --namespace metis-gateway \
    --set image.repository=metis-gateway \
    --set image.tag=dev \
    --set image.pullPolicy=Never \
    --set provider.anthropicApiKey="$ANTHROPIC_API_KEY" \
    --set keystore.existingSecret=metis-gateway-keystore

# 4. Wait + port-forward.
kubectl -n metis-gateway wait deploy/metis-gateway \
    --for=condition=Available --timeout=120s
kubectl -n metis-gateway port-forward svc/metis-gateway 18422:8422 &
curl --silent http://127.0.0.1:18422/healthz
# → {"status":"ok","uptime_seconds":…}

# 5. Real-API smoke: 4 calls (OpenAI sync + SSE, Anthropic sync + SSE)
#    against the canonical haiku id so routing slot 1 actually wins
#    (see "Bare model names route to global_default" pitfall below).
for shape in chat messages; do
  for stream in false true; do
    case "$shape:$stream" in
      chat:false)
        curl -sf http://127.0.0.1:18422/v1/chat/completions \
          -H "Authorization: Bearer $TOKEN" -H "content-type: application/json" \
          -d '{"model":"anthropic:claude-haiku-4-5","max_tokens":16,
               "messages":[{"role":"user","content":"respond with OK"}]}' ;;
      chat:true)
        curl -sNf http://127.0.0.1:18422/v1/chat/completions \
          -H "Authorization: Bearer $TOKEN" -H "content-type: application/json" \
          -d '{"model":"anthropic:claude-haiku-4-5","max_tokens":16,"stream":true,
               "messages":[{"role":"user","content":"respond with OK"}]}' ;;
      messages:false)
        curl -sf http://127.0.0.1:18422/v1/messages \
          -H "x-api-key: $TOKEN" -H "anthropic-version: 2023-06-01" \
          -H "content-type: application/json" \
          -d '{"model":"anthropic:claude-haiku-4-5","max_tokens":16,
               "messages":[{"role":"user","content":"respond with OK"}]}' ;;
      messages:true)
        curl -sNf http://127.0.0.1:18422/v1/messages \
          -H "x-api-key: $TOKEN" -H "anthropic-version: 2023-06-01" \
          -H "content-type: application/json" \
          -d '{"model":"anthropic:claude-haiku-4-5","max_tokens":16,"stream":true,
               "messages":[{"role":"user","content":"respond with OK"}]}' ;;
    esac
    echo
  done
done

# 6. Per-key spend rollup (the gateway image does not expose /analytics/*;
#    point `metis serve` at a VACUUM INTO snapshot of the same trace DB).
POD=$(kubectl -n metis-gateway get pod -o name | head -1 | sed 's|pod/||')
kubectl -n metis-gateway exec $POD -c gateway -- \
    python3 -c "import sqlite3; con=sqlite3.connect('/var/lib/metis/metis.db');
con.execute('PRAGMA wal_checkpoint(TRUNCATE)');
con.execute('VACUUM INTO \"/tmp/snapshot.db\"')"
kubectl -n metis-gateway cp metis-gateway/$POD:/tmp/snapshot.db \
    /tmp/metis-gateway-smoke/metis.db -c gateway
uv run metis serve /tmp/metis-gateway-smoke \
    --port 18430 --db-path /tmp/metis-gateway-smoke/metis.db &
sleep 3
curl -sf http://127.0.0.1:18430/analytics/by_key | python3 -m json.tool

Measured outcome:

Two chart changes landed during this validation:

Pitfalls a buyer will hit

These are the rough edges to expect when doing the install yourself. None of them require source changes; they’re a function of v1 gateway semantics and chart defaults.

Pitfall What happens Workaround
Empty seed keystore blocks first start CrashLoopBackOff with keystore must contain a non-empty 'keys' array. The chart’s default ConfigMap is {"keys": []}, and the gateway refuses to start on it. Issue at least one key out-of-band (uv run metis gateway issue-key … or docker run … issue-key), wrap it in a Secret, install with --set keystore.existingSecret=metis-gateway-keystore.
Bare model names route to global_default A client sending model: "claude-haiku-4-5" (the public Anthropic name) lands in slot 7 (global_default = anthropic:claude-sonnet-4-6). The response body still echoes claude-haiku-4-5 because translators echo the client’s requested_model, so the discrepancy is invisible client-side — but the upstream call (and the billed cost) is sonnet. Send the canonical id (anthropic:claude-haiku-4-5) or one of the gateway-side aliases (haiku, fast, sonnet, balanced, opus, deep, gpt5, mini) — both win routing slot 1. Or set a workspace_default in .metis/routing.yaml to anchor the chosen model when bare names are used.
/analytics/* is not on the gateway image curl http://gateway/analytics/by_key → 404. The gateway is per-request stateless (gateway.md §2) and does not host the analytics surface. Spin up metis serve against the same DB to expose /analytics/cost / /analytics/by_key. Easiest: kubectl exec … 'PRAGMA wal_checkpoint(TRUNCATE); VACUUM INTO /tmp/snapshot.db', kubectl cp it out, metis serve --db-path against the snapshot.
Raw kubectl cp of metis.db returns stale data If you kubectl cp the trace DB while the gateway is taking traffic, SQLite WAL writes that haven’t been checkpointed yet stay in metis.db-wal. The copied .db shows older events than the live one. Force a checkpoint and snapshot first: kubectl exec … python3 -c "import sqlite3; sqlite3.connect('/var/lib/metis/metis.db').execute('PRAGMA wal_checkpoint(TRUNCATE)')" then VACUUM INTO /tmp/snapshot.db, then kubectl cp.
PVC ReadWriteOnce blocks horizontal scaling The default PVC is ReadWriteOnce. Setting replicaCount > 1 or enabling autoscaling will get a second replica stuck Pending (“volume already attached to a node”). Stay at replicaCount: 1 (the documented v1 shape) or switch the storage class to one that supports ReadWriteMany and accept SQLite-on-network-FS caveats. Better long-term fix is to externalize the trace DB, tracked under Observability.
NetworkPolicy is silently ignored on plain kind kind’s default CNI (kindnet) does not enforce NetworkPolicy egress. The chart’s deny-by-default policy renders but has no effect; calls still reach Anthropic. Test the NetworkPolicy on a cluster with Calico / Cilium / Antrea. On kind, install Calico (kubectl apply -f …) before relying on the policy. The egress rule itself is correct (TCP 443 to any IP plus DNS to kube-system).

Cleanup

# Tear down the helm release + namespace.
helm uninstall metis-gateway --namespace metis-gateway
kubectl delete namespace metis-gateway

# Drop the kind cluster.
kind delete cluster --name metis-gateway-smoke

# Local files.
rm -rf /tmp/metis-gateway-smoke ./.metis-gateway

The loopback-bind tax in Kubernetes

The gateway forces host=127.0.0.1 (v1 safety guarantee per specs/server-api.md §3.1). In Kubernetes that means a Service cannot route to the gateway directlytargetPort hits the pod IP, which the gateway does not listen on. Two consequences the chart bridges automatically:

  1. A socat sidecar runs in the gateway pod by default, listening on 0.0.0.0:8423 and forwarding to 127.0.0.1:8422. Sidecars in a pod share the network namespace, so the proxy’s loopback is the gateway’s loopback. The Service targets the sidecar’s port. To swap socat for Caddy / nginx (e.g. to add TLS at the pod boundary), override proxy.image, proxy.command, proxy.args and mount the config via extraVolumes / extraVolumeMounts.
  2. Liveness / readiness probes use exec curl 127.0.0.1, not HTTP probes against the pod IP — kubelet HTTP probes run from the node’s network namespace and cannot reach the gateway’s loopback. The image already includes curl for the Docker healthcheck.

If you turn off the socat sidecar (proxy.enabled=false) without providing your own proxy via extraContainers-style customization, the Service will not reach the gateway. The probes will keep working.

Failure modes worth knowing

Symptom Cause / fix
helm install fails with “set provider.existingSecret OR at least one of provider.*ApiKey” None of the three inline provider keys is set AND no existing Secret is referenced. The gateway refuses to start without a key (runtime.py:84), so the chart fails install early.
Pod stuck CrashLoopBackOff with “gateway keystore not found” keystore.existingSecret is set but the Secret doesn’t have a key named keys.json. Recreate with --from-file=keys.json=./keys.json.
Pod stuck CrashLoopBackOff with keystore must contain a non-empty 'keys' array The chart’s seed keystore is { "keys": [] }, and the gateway rejects an empty array at startup. Issue a key out-of-band, wrap it in a Secret, and reinstall with --set keystore.existingSecret=… — see the Quickstart.
Service connects but every request returns 401 from gateway The keystore Secret you bundled doesn’t have an entry matching the bearer token the client is sending. Re-issue the key against the same keystore file you bundled, recreate the Secret (--dry-run=client -o yaml \| kubectl apply -f -), and kubectl rollout restart deploy/metis-gateway.
Port collision: gateway pod stuck Error, “address already in use” You set proxy.listenPort equal to gatewayPort. The proxy binds 0.0.0.0:listenPort and the gateway binds 127.0.0.1:gatewayPort in the same pod network namespace — a wildcard bind claims every interface. Keep the two ports different (defaults are 8423 / 8422).
NetworkPolicy blocks all egress, including provider APIs Your cluster CNI does not enforce NetworkPolicy egress (some default to ingress-only). Verify with kubectl describe networkpolicy metis-gateway; if your CNI ignores egress rules, the NetworkPolicy is advisory. Calico / Cilium enforce both.

Production-readiness audit

The single-tenant gateway plus this helm chart is appropriate for: one buyer running their own devs through the gateway, one internal team with trusted-network access to the cluster, or a pre-pilot deployment that attributes cost back to a known set of gateway keys you issue manually.

It is not yet appropriate for: shared SaaS multi-tenancy, exposed public ingress without a TLS terminator, team-level cost rollups, or any deployment where a key compromise must trigger automated rotation.

The list below catalogs what the chart inherits from gateway v1, what needs to be the operator’s responsibility, and what’s tracked for future spec work.

TLS termination

What the gateway provides: plaintext HTTP on a loopback bind inside the pod (and pod-IP via the socat sidecar in the cluster network).

What the operator must add: TLS termination in front of the gateway. The chart does not ship a TLS terminator because the choice is deployment-shape-specific. Three good options:

  1. Ingress controller + cert-manageringress.enabled=true with a cert-manager.io/cluster-issuer annotation. nginx-ingress, Traefik, and the AWS load-balancer controller all work; the chart’s Ingress resource is shape-compatible.
  2. Cloud L7 LB with managed certservice.type=LoadBalancer with the cloud provider’s TLS annotations (AWS NLB + ACM, GCP cloud-LB + Google-managed cert, Azure App Gateway). The LB terminates TLS and forwards plaintext to the Service.
  3. Caddy or nginx sidecar inside the gateway pod — replace the socat sidecar via proxy.image + proxy.args + a mounted config file. Lowest blast radius; the TLS terminator and the gateway share a network namespace and the gateway cannot be reached by skipping the sidecar. Pattern documented under Production checklist for the Docker shape.

Do not point untrusted clients at a plaintext Service. The gateway has no auth on the wire other than the Authorization: Bearer gw_… header, and the bearer token is transmitted in plaintext if TLS is missing.

Observability

What’s shipped:

What’s missing (Phase 3 work, flagged here):

Keystore rotation without a restart

Today’s behavior. The gateway reads keys.json at startup. There is no live-reload watcher. A rotation needs:

  1. metis gateway issue-key … to produce the new key entry.
  2. The updated keys.json deployed back into the source of truth (ConfigMap or existingSecret).
  3. A pod restart so the gateway re-reads it.

With the helm chart:

# Issue the new key inside a running pod.
kubectl -n metis-gateway exec deploy/metis-gateway -c gateway -- \
    metis gateway issue-key --keystore /tmp/keys.json \
        --name "client-v2" --workspace /workspace
# Copy out the resulting keys.json (the chart's mounted keystore is
# ConfigMap-backed and read-only inside the pod by default — issue keys
# against a writable path like /tmp, then re-bundle).
kubectl -n metis-gateway cp deploy/metis-gateway:tmp/keys.json ./keys.json -c gateway

# Roll the new keystore into the chart-managed Secret.
kubectl -n metis-gateway create secret generic metis-gateway-keystore \
    --from-file=keys.json=./keys.json \
    --dry-run=client -o yaml | kubectl apply -f -

# Tell the chart to mount the Secret instead of the seed ConfigMap.
helm upgrade metis-gateway ./infra/gateway/helm/ \
    --namespace metis-gateway --reuse-values \
    --set keystore.existingSecret=metis-gateway-keystore

# Roll the pods so the new keystore is read.
kubectl -n metis-gateway rollout restart deploy/metis-gateway

Caveat. During the rollout there’s a brief window (typically seconds) where old pods accept the old key and new pods accept the new key. To revoke a compromised key cleanly: remove the old entry from keys.json first, roll, then add the new entry and roll again. The chart uses maxUnavailable: 0 / maxSurge: 1 so there’s always at least one Ready pod throughout.

Future spec work. Live keystore reload (file-watch or HTTP control plane) is not specified yet. Tracked against the multi-user follow-on in specs/multi-user.md, which will define how key issuance and revocation work in a team / SaaS context.

Multi-tenant safety

The gateway v1 is single-tenant in shape: one gateway key maps to one workspace, and “tenancy” is whatever convention the operator encodes in the key name (provider.existingSecret is one Secret for the whole deployment; provider API keys are not per-tenant).

What’s safe today:

What’s NOT safe today (operator must compensate or wait for multi-user.md):

Phase-3+ work. The multi-user upgrade path — team-level secrets, RBAC on gateway keys, per-tenant analytics rollups, live keystore rotation — is being drafted in parallel as specs/multi-user.md. The chart’s parameterization (provider.existingSecret, keystore.existingSecret, the optional sidecar slot) is deliberately shaped so the same chart can adopt those features without breaking existing deployments. Once multi-user.md lands, expect new chart values for team: / tenant: blocks and a chart-managed CRD or controller for key lifecycle. Until then, this chart targets the single-tenant shipping shape.


See also