Status: Ratified — §5.5.4 open-core gateway + per-seat Pro + enterprise %-of-savings add-on (2026-05-16) Last updated: 2026-05-16
Frames the commercial pricing model for the Metis product itself: gateway access, the analytics dashboard, multi-user identity, the upgrade-tier agent. Surveys the candidate models, names the trade-offs each one creates, and proposes one for the owner to ratify or reject. The spec lays out the choice; the owner closes the project strategy (private).
This spec depends on:
../the project strategy (private)— buyer ≠ user framing; the buyer is the budget owner.../the project strategy (private)— the open question this spec exists to close.deployment-shape.md— hybrid (gateway-first → agent-upgrade); pricing must accommodate “trial without payment” → “convert to paid.”multi-user.md §5— per-user / per-team identity layer; pricing must compose with the shipped per-team budget cap primitive.analytics-api.md §4.7—actual_repriced_usd/baseline_repriced_usd/savings_pct; the measurement substrate any “% of savings” model would depend on.gateway.md— the OSS surface that’s the foot-in-the-door.This spec is product / commercial design, not an engineering contract. It does not add wire fields, event types, or HTTP endpoints. The implementation lands only after the buyer signs off on the model — which means none of §5–§7 below should be read as “to-build.”
Today there is no specced pricing model. Every GTM conversation invents one on the spot. That is fine while the project is one part-time owner with zero paying buyers, but it is the missing piece that blocks the first commercial conversation: the buyer asks “what does this cost?” and the answer needs to be a single sentence backed by a defensible rationale, not a real-time deliberation.
The job of this spec is to:
multi-user.md §5) and the analytics surface (analytics-api.md) so whichever model wins is enforceable via primitives already shipped.What this spec deliberately does not do: settle list prices, name competitors’ price points (the project does not yet have the buyer signal to triangulate against), pick a tiering structure for the paid plan (that’s downstream of the model choice), or commit Metis to any billing infrastructure (Stripe, metering, invoicing). Those are commercial / operational decisions, not engineering ones.
deployment-shape.md §1). Pricing must not break that floor: the path from “trial” to “convert to paid” needs a deliberate point of conversion, not a paywall at the front door.Metis sells three artifacts to the same buyer:
gateway.md). Provides routing, lossless canonical IR, cost attribution, and per-(key / user / team) rollups. Per-request stateless. This is the foot-in-the-door (§4.1).analytics-api.md) that turns trace events into per-(user / team / model / inbound shape) cost rollups, cache-effectiveness panels, the savings counterfactual. The artifact that closes the deal per the project strategy (private). This is the evidence the gateway tier surfaces — the thing that makes the upgrade worth buying.These three are one product from the buyer’s perspective (“Metis”), priced under one umbrella. The pricing model is what determines which of the three each tier actually charges for.
Metis does not meter or resell provider tokens. The buyer brings their own provider API key (BYO-keys per gateway.md §2); Metis routes through it, but the bill for the LLM call itself goes from Anthropic / OpenAI / OpenRouter to the buyer directly. There is no transit margin. This is load-bearing for §5.2 below: a per-call Metis price stacks on top of, not in place of, the provider bill.
Per the project strategy (private), three levers compose the savings:
The gateway is the cheaper lever in the hybrid. The agent is the deeper lever. The pricing model has to make sense for both forms of value — otherwise the gateway-to-agent upsell breaks at the price tag.
The whole point of the gateway is that a buyer flips one env var and savings show up on their dashboard within hours, before they have signed a contract. This requires a usable free tier — paywalling the gateway at request #1 kills the GTM motion.
The pricing model must accept “buyer evaluates Metis for some bounded period at $0” as a first-class state. The exit from that state (to a paid plan) is the conversion event the rest of pricing exists to motivate. The natural conversion triggers are:
Any of these can be the conversion trigger; the model choice in §5 picks one or composes more than one.
The buyer (CTO, eng leader) does not invoice themselves for the dev tool the devs use. The buyer pays a vendor invoice and expects:
multi-user.md §5) is the mechanism; the pricing model has to align its unit with that mechanism so the rollup answers the right question.The user — the dev running Claude Code through the gateway — has different preferences (“don’t interrupt me with a quota wall mid-debug-session”), and the routing-rule soft caps from multi-user.md §6 are the mechanism that lets the buyer’s caps land softly on the user. Pricing should not require the buyer to explain billing to every dev; the gateway abstracts it.
The identity layer ships with:
cost_usd, input_tokens, output_tokens, cached_input_tokens, cache_creation_input_tokens, call_count per dimension per window (multi-user.md §5.2).daily_cap_usd, monthly_cap_usd) enforced at the gateway boundary (multi-user.md §6.3). Request short-circuits with 429 + a typed scope.team_cost_today_exceeds_usd, team_cost_month_exceeds_usd) that redirect to cheaper models (multi-user.md §6.1).actual_repriced_usd / baseline_repriced_usd per window) projected from the same trace store (analytics-api.md §4.7).Every candidate in §5 is measured against “does this compose with the primitives above without inventing new ones?” Models that require new metering surfaces lose the composability test.
The default buyer profile is a 10–50-dev startup CTO, not a 200-dev enterprise eng leader. This profile:
A model that targets this profile will look different from one designed for the enterprise. The recommendation in §7 picks the startup-CTO target; the enterprise tier is a follow-on once buyer evidence accumulates.
Each candidate is evaluated across six dimensions:
Unit: active users per month. An “active user” is a usr_<ulid> whose keys made at least one llm.call_completed in the billing window (definition from multi-user.md §3.1). The keystore already enumerates users.
Incentive alignment:
Buyer friction at first contact:
Buyer friction at scale:
Composability: Excellent. The shipped /analytics/by_user rollup already counts active users per window; gating “active” on at least one event in the window is a single SQL predicate. The team / user records are the seat list.
Billing complexity: Lowest of the four. Monthly count + multiply by rate; the metering is already specced.
Net: Per-seat is the boring, defensible default. It is the model the buyer most easily accepts and the model the eng team least has to build for. Its weakness is that it does not visibly couple Metis’s revenue to the savings Metis delivers — a problem only at the high-spend end.
Unit: llm.call_completed events per billing window, optionally weighted by tokens. Anchored to the same event the analytics surface aggregates against; metering is the call count Metis already records.
Incentive alignment:
llm.call_completed; cost is low but call still counted).Buyer friction at first contact:
Buyer friction at scale:
Composability: Excellent. call_count is already the headline metric on every analytics rollup. Per-key call counts already work; per-user / per-team counts ship with multi-user.md.
Billing complexity: Higher than per-seat — requires daily metering pipeline, idempotency on call retries, refund handling for failed-but-billed calls (do you bill a 5xx? a 429? edge cases multiply).
Net: Per-call is the model with the worst optics. It is technically the easiest to instrument (because Metis already counts calls) and the most usage-coupled, but it asks the buyer to swallow a tax shape they have learned to refuse. Not recommended as the primary surface, though it has a narrow role inside hybrid models (§5.5).
Unit: (baseline_repriced_usd - actual_repriced_usd) per billing window, where the two values are projected via analytics-api.md §4.7’s shipped savings endpoint. Metis charges Y% of that delta.
Incentive alignment:
Buyer friction at first contact:
baseline_repriced_usd is what the buyer would have spent if every routed turn ran on the baseline model). The counterfactual is reproducible from the trace store, but the buyer has to trust it. This is the friction-at-contract-time analog of per-call’s friction-at-pricing-time problem.Buyer friction at scale:
Composability: Strong but requires the counterfactual to be contractually trusted, not just technically computed. The AnalyticsStore.savings() method already produces the delta; the per-team filter from multi-user.md §5.4 makes the cut sensible. What’s missing in v1: an audit-export surface that captures the counterfactual in a tamper-evident form (specced as a follow-on in multi-user.md §7.3). Without that surface, every percentage-of-savings billing dispute is a manual reconciliation.
Billing complexity: Highest. Requires (a) a contractually frozen counterfactual definition the buyer signed off on, (b) an export surface that produces the same number every time it’s run (re-priceability per canonical-message-format.md §6.4’s pricing_version is load-bearing here), (c) credit / dispute handling, (d) a way to apportion savings across teams when one buyer has multiple teams under one contract.
Net: Best alignment, highest auditability cost. Most viable as a secondary model (§5.5 hybrid) once a buyer trusts Metis’s numbers from a primary per-seat relationship, or as the enterprise tier where contract velocity is slower and the audit overhead is acceptable.
Unit: binary — buyer is on the free tier (gateway + basic dashboard) or the paid tier (everything else). Within the paid tier, secondary pricing is one of the three models above.
Incentive alignment:
deployment-shape.md §5.2.Buyer friction at first contact:
Buyer friction at scale:
Composability: Orthogonal — open-core composes with any of §5.1 / §5.2 / §5.3 as the paid-tier pricing model. It is meta to the others, not a competitor to them. The paid-tier model picks among the three.
Billing complexity: Adds the OSS-distribution surface (license, contributor docs, release engineering) but does not itself add billing surface; that’s still whichever model §5.1–§5.3 the paid tier uses.
Net: Open-core is not a pricing model — it is the shape the model wraps around. The recommendation in §7 takes open-core as foundational and answers the “what does the paid tier charge for?” question separately.
The three primary models (§5.1 / §5.2 / §5.3) compose in three credible shapes:
| Model | First-contact friction | At-scale predictability | Incentive alignment | Composability | Billing complexity | Best fit |
|---|---|---|---|---|---|---|
| Per-seat (§5.1) | Very low | High | Medium (rewards seat-count, not usage-coupled) | Excellent | Low | Default; startup CTO |
| Per-call (§5.2) | High (optical) | Low | Strong on usage, but punishes buyer for using more | Excellent | Medium | Not recommended primary |
| % of savings (§5.3) | Medium (auditability) | Medium (scales with savings; needs cap) | Strongest (revenue iff value delivered) | Strong w/ counterfactual | Highest | Secondary; enterprise tier |
| Open-core (§5.4) | Lowest possible (OSS) | N/A (meta) | High (distribution) | Orthogonal | Wraps the paid-tier model | Foundational shape |
| Per-seat + per-call cap (§5.5.1) | Medium | High | Medium-high | Excellent | Medium | Enterprise mid-market |
| Per-seat + capped %-of-savings (§5.5.2) | Medium | High | High | Strong | High | Mid-market |
| Open-core + per-seat Pro (§5.5.3) | Very low | High | Medium-high | Excellent | Low–medium | Recommended — startup CTO default |
| Open-core + per-seat Pro + % enterprise add-on (§5.5.4) | Very low | High | High | Excellent | Medium (Enterprise tier only) | Recommended evolution |
The matrix’s load-bearing rows are the bottom two: open-core + per-seat for the v1 commercial offer, with the enterprise %-of-savings add-on reserved for when the buyer evidence supports building the audit surface.
Adopt §5.5.3: open-core gateway + per-seat paid tier, with §5.5.4’s enterprise add-on reserved as the upgrade path.
The proposed structure is:
Price: $0.
What’s included:
apps/gateway/) — full routing, lossless canonical IR, sync + SSE on both OpenAI- and Anthropic-shapes, all shipped per-key analytics.metis chat, metis tui) — self-hosted, single-user.What’s not included:
User / Team records, per-user analytics, per-team caps). This is the headline paid feature.multi-user.md §7.3) — Pro.Why this slice: the free tier must be usable — a CTO running it on their laptop or a small VPC must see the savings dashboard work on their own data. A gateway without the dashboard is a curiosity, not a tool. But the free tier must not include the things teams need to deploy at team scale: identity, caps, audit, hosted operations. Drawing the line at “single-user works for free; team use upgrades” matches the startup-CTO conversion trigger from §4.1.
Price: per-seat, $X/active-user/month. (Number is the commercial decision; spec frames the model.)
Definition of “active user”: a usr_<ulid> whose gateway keys produced at least one llm.call_completed in the billing window. This composes with the shipped /analytics/by_user rollup directly — no new metering needed. Users with zero calls in the window are not billed; this protects the buyer from paying for stale accounts.
What’s included beyond Free:
users.json / teams.json, per-user / per-team analytics, hard caps, soft-cap routing predicates (multi-user.md §5 / §6)./analytics/quality dashboard.multi-user.md §7.3; Pro-tier feature out of the gate.metis chat-and-beyond surface with skills, context-assembler, memory, learned routing.Why per-seat for Pro:
Why not per-call for Pro: §5.2’s optical problem. Even at low rates, “you charge me per Anthropic call” is an objection the sales conversation should not have to win.
Why not %-of-savings as Pro default: §5.3’s auditability burden is too heavy for the v1 paid relationship. The audit-export surface (multi-user.md §7.3) is not yet built; building it on the first paying contract converts every billing question into a contract negotiation. Reserve it for the enterprise tier where the contract velocity supports the build.
Price: custom contract; baseline is per-seat from §7.2 plus optional %-of-savings add-on.
What’s included beyond Pro:
multi-user.md §8 explicit non-goals become goals at this tier).Why save %-of-savings for here:
This tier is deliberately under-specified in v1. Its existence is more important than its shape; the shape will be the second buyer conversation, not the first.
Open-core gateway maximizes adoption (the deployment-shape gain), per-seat Pro pricing maps cleanly to the shipped multi-user identity layer (the composability gain), and reserving %-of-savings for the enterprise tier defers the audit-export surface buildout until a buyer is paying for the surface to exist.
This is an owner-decision item. The spec frames the choice with rationale; the owner closes it. The two paths forward:
In either case, the project strategy (private) stays open until the owner closes it. This spec adds a pointer (“specced; awaiting commercial decision”) but does not retire the question. The act of choosing among §5/§7 is the commercial decision the project does not yet have evidence to automate.
Tracing through each pricing-relevant surface to confirm no new metering / event / endpoint is needed for §7.2:
Source of truth: users.json enumerates User records; gateway keys carry user_id (multi-user.md §4.1). The “active user in this window” predicate is EXISTS (SELECT 1 FROM events WHERE payload.user_id = ? AND event_type = 'llm.call_completed' AND ts BETWEEN ? AND ?) — same shape as the /analytics/by_user rollup already executes.
No new event type needed. No new HTTP endpoint needed for billing — the analytics surface already returns the user list with call_count; billing reads the count of users with call_count > 0.
Source of truth: a tier flag on the deployment configuration. The simplest shape — out of scope for this spec, named for completeness — is a pricing_tier field on the deployment config (free / pro / enterprise) that the analytics-server reads at boot. Pro-gated endpoints return 402 payment_required (or 404, owner choice) on a free deployment.
This is not a per-request enforcement — it is a deployment-level gate. A Pro deployment is a Pro deployment; a free deployment cannot opt into Pro endpoints without changing the config. This avoids per-call licensing checks (which would themselves be a metering surface).
The per-team hard caps from multi-user.md §6.3 are operational, not commercial. A buyer setting Team.daily_cap_usd = $50 is constraining their provider spend, not their Metis bill. The two are independent dimensions: a team can hit its provider cap while still being under its Metis seat allotment, and vice versa.
If the owner wants to add a Metis-spend cap (e.g. “auto-downgrade to Free tier if Pro seat count exceeds N”), that’s a follow-on. The v1 recommendation keeps the two surfaces separate.
The actual_repriced_usd / baseline_repriced_usd / savings_pct endpoint (analytics-api.md §4.7) ships in v1; whether it surfaces on the Free dashboard or only the Pro dashboard is a product decision the spec doesn’t close.
Spec position: the savings counterfactual ships on Free at per-deployment granularity (one number for everything routed through this Metis instance). The Pro tier unlocks per-team / per-user / per-time-window slicing of the same number. This way Free buyers see “you saved $X” but cannot answer “who saved what” — which is the buyer question the multi-user identity layer answers.
When §7.3 is built, the surface needed is exactly the multi-user.md §7.3 audit-export surface plus a contractual freeze on the pricing_version field (canonical-message-format.md §6.4). The two together produce a re-priceable savings number the buyer can verify against their own records. No additional metering primitive is needed — the savings counterfactual already specced is the answer; what’s missing is the export surface and the contract language, both of which are commercial concerns not engineering ones.
These are deliberately deferred. The list is what differentiates v1 from a complete commercial-operations playbook.
metis-pro repo. See §12 decision log and the repo-split plan (private).multi-user.md §7.4 audit posture; deferred.These are live. AI agents working in the repo should surface them, not pick.
users.json / teams.json get archived? Does Pro tier soft-degrade to Free with rollups still visible? Standard SaaS handling — but the spec doesn’t pin the policy.pricing.event audit events on tier transitions? A future event-catalog addition; would let analytics show “deployment moved from Free to Pro on date X.” Out of scope for v1 but flagged as a clean extension once the model is ratified.Promises this spec makes that downstream specs and implementation must preserve.
actual_repriced_usd / baseline_repriced_usd must use the same pricing_version-stamped computation the analytics dashboard shows. Two different numbers for “did we save money” — one for the bill, one for the dashboard — is the failure mode the audit-export surface exists to prevent.| Date | Decision | Rationale |
|---|---|---|
| 2026-05-14 | Open-core foundation (free OSS gateway + agent CLI/TUI; Pro for team-scale features) | Maximizes adoption per deployment-shape.md hybrid; matches the “trial without payment” floor in §4.1. |
| 2026-05-14 | Per-seat as the Pro tier’s pricing unit (not per-call, not %-of-savings) | Per-seat is the buyer’s expected SaaS shape (§4.4); per-call has the optical problem of §5.2; %-of-savings requires the audit-export surface (multi-user.md §7.3) which is not yet built. |
| 2026-05-14 | “Active user” as the seat metering unit, not “provisioned seat” | Protects the buyer from paying for stale accounts; composes directly with the shipped /analytics/by_user rollup. Trade-off: harder to forecast. |
| 2026-05-14 | Enterprise tier reserves %-of-savings as an optional add-on, not the baseline | Reserves auditability burden for contracts where procurement velocity supports it. |
| 2026-05-14 | Multi-user identity layer is the headline Pro feature | Maps the conversion trigger (“single-user works free; team use upgrades”) directly to the most distinctive paid primitive. |
| 2026-05-14 | Savings counterfactual visible on Free at deployment-aggregate; Pro unlocks slicing | Free buyers see the savings story (marketing); Pro buyers can attribute it (operational). Owner can revisit per §10.2. |
| 2026-05-14 | Pricing.md does not close the project strategy (private) | The spec frames the recommendation; the commercial decision is the owner’s. §6.8 stays open with a pointer to this spec until the owner ratifies. |
| 2026-05-17 | OSS license + repo strategy: Apache-2.0 single OSS repo (metis); private metis-pro repo for paid-tier code |
Closes §9.5 (was “out of scope”). Two-repo “thin Pro repo” pattern: the OSS substrate is genuinely standalone-usable (gateway + canonical IR + adapters + routing + pattern store + bounded memory + tools + skills + heuristic evaluator + per-key analytics + agent CLI/TUI). metis-pro holds the operationally-sensitive surfaces (billing, signup, accounts store, hosted dashboard UI, curated LLM-judge rubric library, enterprise SAML/OIDC/SCIM glue). OSS defines extension Protocols (BillingBackend, SignupBackend, AnalyticsExtension, JudgeRubricProvider) with noop defaults; Pro overlays implement them at boot. Apache-2.0 chosen over BUSL/AGPL for the OSS substrate because (a) buyer trust signal is load-bearing pre-revenue, (b) the four-leg moat (strategic context, private) is operational/compounding, not source-level, (c) reversible — can switch to BUSL later if a real fork-and-SaaS threat materializes, (d) matches the playbook of comparable projects (LiteLLM Apache-2.0; Supabase Apache-2.0; PostHog Apache-2.0 + selective BUSL). Concrete migration plan in the repo-split plan (private). |
../the project strategy (private) — buyer ≠ user; B2B framing; the budget owner is the buyer.../the project strategy (private) — startup-CTO default for the v1 buyer profile.../the project strategy (private) — local-first vs. SaaS deployment posture (still open).../the project strategy (private) — the open question this spec exists to surface for closure (kept open until owner ratifies).../the project strategy (private) — what changes about the build if §3 lands one way or the other; the “If hybrid” branch is the deployment substrate pricing assumes.deployment-shape.md — hybrid (gateway-first → agent-upgrade); the deployment posture pricing composes with.gateway.md — the OSS gateway surface that is the free-tier foot-in-the-door.multi-user.md — the identity layer that enforces per-seat / per-team pricing.multi-user.md §5 — the analytics surface that meters seats.multi-user.md §6.3 — per-team hard caps; orthogonal to pricing but operationally adjacent.multi-user.md §7.3 — audit-export surface; the missing precondition for an enterprise-tier %-of-savings line.multi-user.md §8 — explicit non-goals (SSO / OIDC / SAML / SCIM / RBAC / multi-org) that the enterprise tier eventually inherits as goals.analytics-api.md §4.1 — per-user / per-team / per-key rollups; the metering substrate.analytics-api.md §4.7 — the savings counterfactual.analytics-api.md §4.9 — /analytics/by_team; the per-team rollup pricing composes with.canonical-message-format.md §6.4 — pricing_version stamping; load-bearing for re-priceable savings counterfactual.event-bus-and-trace-catalog.md §6 — the event catalog; pricing is read-only against this surface.benchmark.md — workload-suite methodology that backs the savings counterfactual’s credibility.Ratified 2026-05-16 — owner accepted the §5.5.4 recommendation: open-core gateway + per-seat Pro + reserved enterprise %-of-savings add-on. Price points (per-seat $/month, %-of-savings rate, Free-tier spend cap floor) remain commercial decisions deferred to first-buyer triangulation; this spec ratifies the model shape, not the numbers. Implementation lands as Wave 15.
the project strategy (private) edits landed alongside ratification:
pricing.md).”pricing.md.”