Status: Draft v1 Last updated: 2026-05-12
v1 revisions (same day):
- First pass: corrected cache hit-rate denominator to include cache writes (§4.2); hard-failure
route.decidedevents bucketed underhard_failures(§4.3); sessions list sort and display reference one source field (§4.5); turn drill-down handles in-flight turns and includessession_idin SQL (§4.6); SQL injection guard via whitelist mapping documented forgroup_by/order(§4.1, §4.5); dropped speculative window-size cap (§6); compositecurrent_pricing_versionstrings flagged as opaque (§5).- Second pass: added
invalid_ordererror code symmetric withinvalid_group_by(§6);wins_by_policyalways emits all seven policy slots so the SPA doesn’t need to know the enum (§4.3);/reliabilityscope clarified with cross-reference to/routing.hard_failures(§4.4);group_by=noneaggregate behavior made explicit (§4.1); SPA UX rule for partialsavings_pctwhenrows_missing_from_price_table > 0(§4.7).- Third pass: row schemas tabulated for all six
group_byvalues (§4.1); Decimal-through-aggregate / JSON-number-at-boundary convention pinned (§5.1);actual_stamped_usdclarified as unconditional across missing-model rows (§4.7); negative-savings semantics called out as a valid SPA case (§4.7).
Extends server-api.md with a read-only
/analytics/*namespace that powers the dashboard SPA. All metrics are derived from the existing trace store (eventstable) and session store (sessions,messagestables) — no new persistent state, no new bus events, no new write paths.
The dashboard surfaces the LLM-usage metrics Metis already captures: cost over time, tokens by model, routing-decision breakdown, prompt-cache effectiveness, reliability, and a savings counterfactual vs. a baseline model. This spec defines the HTTP shape that backs those views.
Two design constraints shape the surface:
This spec depends on:
canonical-message-format.md for Message, MessageMetadata, Usage, RoutingDecisionRecord, and the SQLite schema in §9.1.event-bus-and-trace-catalog.md for the llm.call_completed, llm.call_failed, route.decided, turn.completed, tool.completed, and memory.updated payloads.server-api.md for the HTTP surface conventions (Starlette, JSON, loopback-only).GET. No bus events, no DB writes, no cache invalidation surface. Re-running the same request returns the same answer (modulo new events landing).llm.call_completed rows in the table. Existing indexes on (type, timestamp_us) and (session_id, id) carry every query.pricing_version on each llm.call_completed event (matches the bill). The savings counterfactual re-prices both numerator and denominator under the current PriceTable (apples-to-apples).from/to already — only the UI is constrained./sessions/{id}/stream already carries the live signal for clients that want it.Every endpoint that aggregates over time accepts:
| Parameter | Type | Required | Default |
|---|---|---|---|
from |
ISO 8601 UTC | no | now - 7d |
to |
ISO 8601 UTC | no | now |
The API speaks UTC end-to-end. The SPA is responsible for computing UTC bounds for “today / 7d / 30d / all” from the user’s local timezone. This keeps the server pure and means flipping to a custom date-picker UI later is a frontend-only change.
Predicate mismatch. The
cost_today_exceeds_usdrouting predicate (routing-engine.md §5.3.1) uses UTC midnight, not local. A user looking at “today on the dashboard” (local TZ) may therefore see a slightly different number than the breaker sees. Documented as a known asymmetry; not worth aligning until evidence it confuses people.
Every endpoint returns:
{
"window": {"start": "2026-05-05T00:00:00Z", "end": "2026-05-12T00:00:00Z"},
"current_pricing_version": "2026-05-08",
"data": ...
}
window echoes the resolved bounds (so the SPA can label “Today (PST)” with full provenance).current_pricing_version is the version of the PriceTable active right now. Useful when a view mixes stamped and re-priced values: the SPA can show “actuals at write-time prices; counterfactual at data is endpoint-specific.All endpoints live under /analytics/ on the same Starlette app as the rest of the server. All are loopback-only.
GET /analytics/costAggregated cost and token counts. Combines what would otherwise be a separate /tokens endpoint — same source data, same SQL shape.
Query parameters:
| Parameter | Type | Required | Default |
|---|---|---|---|
from,to |
ISO 8601 UTC | no | last 7d |
group_by |
model | provider | session | day | hour | gateway_key | user | team | parent_session | is_worker | none |
no | model |
gateway_key |
exact-match filter on payload.gateway_key_id |
no | (all) |
user |
exact-match filter on payload.user_id |
no | (all) |
team |
exact-match filter on payload.team_id |
no | (all) |
include_workers |
bool — when false, excludes rows whose payload.parent_session_id is set (delegation.md §8.2) |
no | true |
The three filter parameters (gateway_key / user / team) are passed via SQL placeholders and additionally regex-validated at the HTTP boundary (^[A-Za-z0-9_-]{1,200}$); malformed values return 400 invalid_gateway_key / invalid_user / invalid_team. Combinations AND together — ?user=alice&team=eng narrows to events stamped with both. v1 of multi-user.md treats names → ids resolution as a future step; the implementation accepts whatever stable id (usr_<ulid> / team_<ulid>) the trace event carries.
Source: events table, type = 'llm.call_completed'. Cost is the stamped value (matches invoice).
SQL (group_by=model):
SELECT
json_extract(payload_json, '$.model') AS model,
json_extract(payload_json, '$.provider') AS provider,
SUM(json_extract(payload_json, '$.cost_usd')) AS cost_usd,
SUM(json_extract(payload_json, '$.input_tokens')) AS input_tokens,
SUM(json_extract(payload_json, '$.output_tokens')) AS output_tokens,
SUM(json_extract(payload_json, '$.cached_input_tokens')) AS cached_input_tokens,
SUM(json_extract(payload_json, '$.cache_creation_input_tokens')) AS cache_creation_input_tokens,
AVG(json_extract(payload_json, '$.latency_ms')) AS avg_latency_ms,
COUNT(*) AS call_count
FROM events
WHERE type = 'llm.call_completed'
AND timestamp_us >= ? AND timestamp_us < ?
GROUP BY model, provider
ORDER BY cost_usd DESC;
For group_by=day and group_by=hour, use SQLite’s date(timestamp_us/1000000, 'unixepoch') and strftime('%Y-%m-%dT%H', timestamp_us/1000000, 'unixepoch') respectively (UTC buckets). For these bucketed views the result is ORDER BY <bucket> ASC so the SPA can render a time series without re-sorting. For group_by=session, group on the events table’s session_id column (covered by the idx_events_session_id index), not on json_extract(payload_json, '$.session_id') — LLMCallCompleted payloads carry no session_id field.
The group_by parameter is whitelist-mapped to a literal GROUP BY clause in the handler. The raw request string is never interpolated.
For group_by=none, the GROUP BY clause is omitted entirely and the response carries a single-row aggregate over the full window — useful for the headline “total spend this period” tile.
Row schemas per group_by value:
Every row carries the same numeric columns (cost_usd, input_tokens, output_tokens, cached_input_tokens, cache_creation_input_tokens, avg_latency_ms, call_count). What differs is the grouping key, summarized below:
group_by |
Key columns | Shape | Order |
|---|---|---|---|
model |
model, provider |
array | cost_usd DESC |
provider |
provider |
array | cost_usd DESC |
session |
session_id (from events column) |
array | cost_usd DESC |
day |
bucket (UTC date, YYYY-MM-DD) |
array | bucket ASC (time series) |
hour |
bucket (UTC hour, YYYY-MM-DDTHH) |
array | bucket ASC (time series) |
gateway_key |
gateway_key_id (nullable; in-process agent traffic rolls up under null) |
array | cost_usd DESC |
user |
user_id (nullable; agent-loop + pre-v1 keys roll up under null) |
array | cost_usd DESC |
team |
team_id (nullable; same null convention) |
array | cost_usd DESC |
parent_session |
parent_session_id (COALESCE(payload.parent_session_id, events.session_id) — workers roll up under their planner, top-level sessions are their own key) |
array | cost_usd DESC |
is_worker |
is_worker (string "planner" or "worker"; partitions by whether the call came from a worker session — delegation.md §8.2) |
array | cost_usd DESC |
none |
(no key) | object | n/a (single aggregate) |
day/hour are single-dimension time buckets — they do not also split by model. A future group_by=day,model (multi-key) is non-breaking but out of scope for v1. none returns a single JSON object as data rather than an array — the response envelope is otherwise unchanged.
Example (group_by=day):
{
"window": {...},
"current_pricing_version": "...",
"data": [
{"bucket": "2026-05-10", "cost_usd": 0.42, "input_tokens": 18402, ...},
{"bucket": "2026-05-11", "cost_usd": 0.31, "input_tokens": 12808, ...}
]
}
Example (group_by=none):
{
"window": {...},
"current_pricing_version": "...",
"data": {
"cost_usd": 1.2347, "input_tokens": 410230, "output_tokens": 18402,
"cached_input_tokens": 0, "cache_creation_input_tokens": 0,
"avg_latency_ms": 1820, "call_count": 48
}
}
Response (group_by=model):
{
"window": {"start": "...", "end": "..."},
"current_pricing_version": "2026-05-08",
"data": [
{
"model": "anthropic:claude-sonnet-4-6",
"provider": "anthropic",
"cost_usd": 1.2347,
"input_tokens": 410230,
"output_tokens": 18402,
"cached_input_tokens": 0,
"cache_creation_input_tokens": 0,
"avg_latency_ms": 1820,
"call_count": 48
}
]
}
GET /analytics/cache_effectivenessCache-read share, cache-creation share, and hit rate per model.
Source: llm.call_completed events.
SQL:
SELECT
json_extract(payload_json, '$.model') AS model,
SUM(json_extract(payload_json, '$.input_tokens')) AS uncached_input_tokens,
SUM(json_extract(payload_json, '$.cached_input_tokens')) AS cached_input_tokens,
SUM(json_extract(payload_json, '$.cache_creation_input_tokens')) AS cache_creation_tokens,
COUNT(*) AS call_count
FROM events
WHERE type = 'llm.call_completed'
AND timestamp_us >= ? AND timestamp_us < ?
GROUP BY model;
Computed in the handler:
total_input = uncached_input_tokens + cached_input_tokens + cache_creation_tokens
hit_rate = cached_input_tokens / total_input
cache_write_share = cache_creation_tokens / total_input
Returns null for both ratios when total_input is zero (no input tokens recorded).
Why include cache_creation_tokens in the denominator. Cache writes are billed at roughly 1.25× the standard input rate, so they aren’t free. Pretending they aren’t there inflates hit_rate in any session with non-trivial cache turnover (e.g. a session whose system prompt has just changed and is rebuilding the cache). The honest formulation is: of all input tokens, what fraction were served from cache? The two ratios sum to ≤ 1 with the remainder being uncached input. cache_write_share is broken out separately so operators can see “am I rebuilding the cache too often?” as a distinct signal.
Note: at the time of writing, no adapter emits cache_control markers (KNOWN_ISSUES.md — “No prompt-caching strategy”). All hit rates will read 0 until that lands. The view is therefore both diagnostic and a forcing function.
Response:
{
"window": {...},
"current_pricing_version": "...",
"data": [
{
"model": "anthropic:claude-sonnet-4-6",
"uncached_input_tokens": 410230,
"cached_input_tokens": 0,
"cache_creation_tokens": 0,
"hit_rate": 0.0,
"cache_write_share": 0.0,
"call_count": 48
}
]
}
GET /analytics/routingWhich routing-chain slot is winning, and what’s being rejected.
Source: events table, type = 'route.decided'.
Implementation note: the routing chain lives inside payload_json.chain as a JSON array. SQLite’s json_each() can iterate it in SQL, but Python-side traversal is clearer and at single-user scale doesn’t cost meaningfully. The handler fetches the rows in the window and walks each chain in Python:
SELECT payload_json
FROM events
WHERE type = 'route.decided'
AND timestamp_us >= ? AND timestamp_us < ?;
Then aggregates three counters:
winner_index >= 0 AND winner_index < len(chain), pick chain[winner_index].policy; group.winner_index == -1 (or otherwise out of range) count toward hard_failures, not toward any policy’s win count. Per routing-engine.md §7.2 and the engine’s behavior in routing/engine.py, every turn emits exactly one route.decided including on hard failure; the failure rows have winner_index = -1 and the response would otherwise miscount or index-error.chain entry with verdict='rejected', group by policy and validation_failure. Hard-failure events contribute their rejected chain entries here too — they’re the most interesting source of rejection data.This is the one endpoint where aggregation deliberately doesn’t push to SQL. The trade-off (per §2.1.2): with 10K turns the handler walks ~70K chain entries on the all-time view, which is well under the budget; if it becomes a bottleneck, push to SQL with json_each().
wins_by_policy always emits all seven policy slots (per routing-engine.md §4.1), even when count is zero. This keeps the SPA from needing to know the enum to render tiles in a stable order. rejections and wins_by_model are sparse — only present rows appear.
Response:
{
"window": {...},
"current_pricing_version": "...",
"data": {
"wins_by_policy": [
{"policy": "per_message_override", "count": 12},
{"policy": "manual_sticky", "count": 84},
{"policy": "rule", "count": 0},
{"policy": "pattern", "count": 0},
{"policy": "delegate_request", "count": 0},
{"policy": "workspace_default", "count": 0},
{"policy": "global_default", "count": 312}
],
"hard_failures": 2,
"rejections": [
{"policy": "manual_sticky", "validation_failure": "exceeds_context_window", "count": 3}
],
"wins_by_model": [
{"chosen_model": "anthropic:claude-sonnet-4-6", "count": 312}
]
}
}
GET /analytics/reliabilityError class breakdown and latency percentiles per model. Scope: errors that reached the adapter and came back classified — llm.call_failed events. Turns that never reached the LLM because routing exhausted the chain (hard failures) appear in /analytics/routing.hard_failures, not here. An operator seeing “reliability all green but everything is broken” should check the routing endpoint.
Source: llm.call_failed and llm.call_completed events.
SQL for errors:
SELECT
json_extract(payload_json, '$.model') AS model,
json_extract(payload_json, '$.provider') AS provider,
json_extract(payload_json, '$.error_class') AS error_class,
COUNT(*) AS count
FROM events
WHERE type = 'llm.call_failed'
AND timestamp_us >= ? AND timestamp_us < ?
GROUP BY model, provider, error_class;
Latency percentiles: SQLite has no native PERCENTILE_*. The handler fetches latency_ms ordered ascending per model and computes p50 / p95 in Python:
SELECT
json_extract(payload_json, '$.model') AS model,
json_extract(payload_json, '$.latency_ms') AS latency_ms
FROM events
WHERE type = 'llm.call_completed'
AND timestamp_us >= ? AND timestamp_us < ?
ORDER BY model, latency_ms;
For 10K rows this is ~10K integers in memory per request — trivial.
Response:
{
"window": {...},
"current_pricing_version": "...",
"data": {
"errors_by_class": [
{"model": "anthropic:claude-opus-4-7", "provider": "anthropic",
"error_class": "rate_limit", "count": 4}
],
"latency_ms_by_model": [
{"model": "anthropic:claude-sonnet-4-6",
"p50": 1620, "p95": 4830, "sample_size": 312}
]
}
}
GET /analytics/sessionsSession list with rollups. The Cost view sorts by spend; the Activity view sorts by recency.
Query parameters:
| Parameter | Type | Required | Default |
|---|---|---|---|
limit |
int | no | 25 |
order |
cost | recency |
no | recency |
Source: sessions table only. The displayed updated_at is bumped by update_session() after every turn (sessions/sqlite_store.py), which is the same surface used to sort by recency — display and ordering reference one field.
SQL:
SELECT
s.id, s.workspace_path, s.active_model,
s.cost_so_far_usd, s.turn_count,
s.created_at, s.updated_at
FROM sessions s
ORDER BY <order_column> DESC
LIMIT ?;
<order_column> is whitelist-mapped from the order query parameter (cost → s.cost_so_far_usd, recency → s.updated_at). The handler does not interpolate the raw parameter string into SQL.
Response field rename: the SQL column cost_so_far_usd is exposed as cost_usd in the response body, matching the naming used elsewhere on the surface. The updated_at field is the same value the SQL sorts by — it tracks the last post-turn write, which is within microseconds of the last message timestamp in practice.
{
"window": {"start": null, "end": null},
"current_pricing_version": "...",
"data": [
{
"id": "sess_01HZ...",
"workspace_path": "/Users/.../my-project",
"active_model": "anthropic:claude-sonnet-4-6",
"cost_usd": 0.42,
"turn_count": 17,
"created_at": "2026-05-12T10:14:23Z",
"updated_at": "2026-05-12T11:02:08Z"
}
]
}
window is null/null because session list is not time-windowed in v1. Future iteration may add filtering.
GET /analytics/turns/{turn_id}Drill-down into a single turn. Returns the full event timeline plus the message bodies.
Source: events table by turn_id, messages table by session_id filtered to the turn’s window.
SQL:
SELECT id, timestamp_us, session_id, type, actor, payload_json, parent_event_id
FROM events
WHERE turn_id = ?
ORDER BY id;
The handler reads session_id from the first event row (all rows in a turn share one session); 404 if no rows match. Plus a follow-up query for the canonical messages spanned by the turn:
SELECT id, role, content_json, metadata_json, created_at
FROM messages
WHERE session_id = ?
AND created_at BETWEEN ? AND ?
ORDER BY created_at, id;
Bounds:
turn.started event (always present).turn.completed or turn.cancelled if present; otherwise now() at request time. The response carries in_flight: true in the latter case so the SPA knows the message list may grow on re-poll.This lets the SPA render an in-flight turn (matching the live /sessions/{id}/stream view); a turn with no terminator event is not a 404.
Response:
{
"window": {...},
"current_pricing_version": "...",
"data": {
"turn_id": "01HZ...",
"session_id": "sess_01HZ...",
"in_flight": false,
"events": [
{"id": "...", "timestamp": "...", "type": "turn.started",
"actor": "user", "payload": {...}}
],
"messages": [
{"id": "...", "role": "user", "content": [...],
"metadata": {...}, "created_at": "..."}
]
}
}
This endpoint is the link target from every “top-N expensive turns” / “most-recent activity” tile in the SPA.
GET /analytics/savingsThe counterfactual: what would this window have cost if every turn had run on a single baseline model? Both numerator and denominator are re-priced under the current PriceTable to keep the comparison meaningful when prices change.
Query parameters:
| Parameter | Type | Required | Default |
|---|---|---|---|
baseline |
canonical model id | no | anthropic:claude-sonnet-4-6 |
from,to |
ISO 8601 UTC | no | last 7d |
Algorithm:
llm.call_completed events in the window. Each carries model, input_tokens, output_tokens, cached_input_tokens, cache_creation_input_tokens.model in the current PriceTable and compute cost. Sum.PriceTable. Sum.Errors:
400 if baseline isn’t in the current PriceTable.model is missing from the current price table is included in baseline but excluded from actual_repriced_usd, with rows_missing_from_price_table incremented so the SPA can flag the discrepancy. This is a deliberate convention to keep baseline_repriced_usd - actual_repriced_usd honest in the dominant case (every model in the actual mix is currently priced); the alternative — silently dropping rows from both sides — would let pricing-table gaps inflate apparent savings. actual_stamped_usd covers the “what I actually paid” view independently for missing-model rows.Response:
{
"window": {...},
"current_pricing_version": "2026-05-08",
"data": {
"baseline_model": "anthropic:claude-sonnet-4-6",
"actual_repriced_usd": 1.42,
"baseline_repriced_usd": 4.18,
"savings_usd": 2.76,
"savings_pct": 0.66,
"actual_stamped_usd": 1.39,
"rows_total": 412,
"rows_missing_from_price_table": 0
}
}
Both actual_repriced_usd and actual_stamped_usd are returned so the SPA can show the actual invoice number and the apples-to-apples comparison without a second round-trip.
SPA rule when rows_missing_from_price_table > 0: display a warning beside savings_pct indicating the comparison is partial. The percentage is structurally overstated in this case (baseline counts every row; actual_repriced excludes the missing ones), so showing it without the caveat misleads. The numerator/denominator are still both useful — they just aren’t quite the same shape.
actual_stamped_usd is unconditional. It sums every row’s stamped cost_usd regardless of whether the row’s model is in the current price table — stamped values were computed at write time, so they’re correct independent of PriceTable state. rows_missing_from_price_table affects only actual_repriced_usd (and therefore savings_usd / savings_pct).
Negative savings are a valid result. If the user runs with an unusually expensive baseline or genuinely consumed more than the baseline would have, savings_usd and savings_pct can be negative — meaning “you spent 26% more than baseline.” The SPA must render this case explicitly (e.g. red label, “26% over baseline”), not absolute-value, hide, or floor it to zero. The math is correct; suppressing the sign would lie about the workload.
GET /analytics/by_keyPer-(gateway-key) cost / token / call-count rollup with an inbound-shape breakdown per key. The companion view to gateway.md §6 — the gateway stamps gateway_key_id and inbound_shape onto every llm.call_completed; this endpoint is where operators consume the rollup.
Query parameters:
| Parameter | Type | Required | Default |
|---|---|---|---|
from,to |
ISO 8601 UTC | no | last 7d |
gateway_key |
exact-match filter | no | (all keys) |
The gateway_key filter is passed via parameterized SQL placeholder; the HTTP layer additionally rejects values that don’t match ^[A-Za-z0-9_-]{1,200}$ with a 400 invalid_gateway_key — defense in depth even though the SQL itself is safe by construction.
Source: events table, type = 'llm.call_completed'. Costs are stamped (matches the invoice).
Algorithm: the SQL fetches one row per call within the window, including gateway_key_id and inbound_shape from the payload. The handler aggregates in Python by gateway_key_id (using Decimal per §5.1) and tracks a per-shape sub-aggregate. Rows that originated from the in-process agent loop (CLI / TUI / metis serve) carry gateway_key_id: null and roll up under the null key.
Response:
{
"window": {"start": "...", "end": "..."},
"current_pricing_version": "2026-05-08",
"data": [
{
"gateway_key_id": "gk_01HZ...",
"cost_usd": 0.4231,
"input_tokens": 14820,
"output_tokens": 612,
"cached_input_tokens": 0,
"cache_creation_input_tokens": 0,
"call_count": 12,
"by_inbound_shape": [
{"inbound_shape": "openai", "call_count": 8, "cost_usd": 0.3010},
{"inbound_shape": "anthropic", "call_count": 4, "cost_usd": 0.1221}
]
},
{
"gateway_key_id": null,
"cost_usd": 1.0512,
"input_tokens": 51220,
"output_tokens": 1842,
"cached_input_tokens": 0,
"cache_creation_input_tokens": 0,
"call_count": 30,
"by_inbound_shape": [{"inbound_shape": null, "call_count": 30, "cost_usd": 1.0512}]
}
]
}
Rows are sorted by cost_usd DESC. The by_inbound_shape sub-array is also cost_usd DESC. inbound_shape: null is the natural shape for in-process agent traffic (no inbound translator ran).
GET /analytics/by_teamPer-(team) cost / token / call-count rollup with a per-user breakdown. Companion to /analytics/by_key; the buyer surface for the multi-user.md §5.2 contract. The gateway stamps team_id (and user_id) onto every llm.call_completed it serves (multi-user.md §4.4); this endpoint is where the budget owner consumes the rollup.
Query parameters:
| Parameter | Type | Required | Default |
|---|---|---|---|
from,to |
ISO 8601 UTC | no | last 7d |
team |
exact-match filter on payload.team_id |
no | (all teams) |
The team filter follows the same regex guard as /analytics/by_key’s gateway_key (^[A-Za-z0-9_-]{1,200}$); malformed values return 400 invalid_team. The id is bound via SQL placeholder.
Source: events table, type = 'llm.call_completed'. Cost is the stamped value (matches invoice). Per multi-user.md §3.4 the agent-loop path emits team_id: null / user_id: null, so that traffic rolls up under a single team_id: null row whose by_user array contains the matching user_id: null sub-row.
Algorithm: the SQL fetches one row per call within the window with the stamped team_id / user_id. The handler aggregates by team_id (Python, Decimal per §5.1) and tracks a per-user sub-aggregate inside each team. user_count is the number of distinct non-null user_id values seen in the team — the null bucket represents un-tagged traffic, not a separate identity.
Response:
{
"window": {"start": "...", "end": "..."},
"current_pricing_version": "2026-05-08",
"data": [
{
"team_id": "team_01HZ...",
"cost_usd": 12.4231,
"input_tokens": 4910220,
"output_tokens": 81502,
"cached_input_tokens": 2500000,
"cache_creation_input_tokens": 410220,
"call_count": 412,
"user_count": 2,
"by_user": [
{"user_id": "usr_01HZ...", "cost_usd": 8.1010, "call_count": 281},
{"user_id": "usr_01HZ...", "cost_usd": 4.3221, "call_count": 131}
]
},
{
"team_id": null,
"cost_usd": 1.0512,
"input_tokens": 51220,
"output_tokens": 1842,
"cached_input_tokens": 0,
"cache_creation_input_tokens": 0,
"call_count": 30,
"user_count": 0,
"by_user": [{"user_id": null, "cost_usd": 1.0512, "call_count": 30}]
}
]
}
Rows sorted by cost_usd DESC; by_user sub-array is also cost_usd DESC.
v1 scope. The implementation ships the rollup shape; the team-record join (team_name, daily_cap_usd, monthly_cap_usd from teams.json) and the partial_coverage flag described in multi-user.md §5.2 / §5.4 are deferred until the keystore-side user/team records are wired in (Agent 8a’s downstream wave). When that lands, the join is additive — these fields appear next to team_id without changing the existing shape.
The “buyer can hand a departing user all their data” surface, paired with the
“buyer can honor a forget request” surface. Both are stable id-keyed by
user_id and compose with the redaction policy owned by redaction.md:
metis_core.redaction.Redactor protocol; the analytics surface owns
the audit-event emission and the HTTP error mapping.Both endpoints close the gap multi-user.md §11.5 surfaced as deferred and §7.4.4 sketched as the “right-to-delete pathway.”
GET /analytics/user/{user_id}/exportStreams every trace event stamped with user_id as JSONL (one JSON
object per line). Suitable for piping to grep / jq or saving as the
artifact a buyer hands the data subject.
Path parameter:
| Parameter | Type | Shape guard |
|---|---|---|
user_id |
usr_<ulid> or human alias |
^[A-Za-z0-9_-]{1,200}$; 400 invalid_user_id on miss |
Query parameters:
| Parameter | Type | Required | Default |
|---|---|---|---|
from,to |
ISO 8601 UTC | no | all-time (no window) |
Source: events table, every row whose json_extract(payload_json,
'$.user_id') = ?. The matcher is agnostic to event type so future
additive user_id-bearing payloads (e.g. new gateway audit events) land
in the export automatically without code changes.
Streaming: the response uses Starlette’s StreamingResponse. The
handler iterates the SQLite cursor row-by-row and emits one JSONL line
at a time; the full result set is never materialized in Python memory.
A 10k-event export costs O(1) RAM.
Sensitivity-tier handling: PRIVATE-tier fields are passed through
verbatim. This is the subject’s own data — they have, by definition,
already consented to its capture (the data is about them). Audit
exports for SOC2 / SIEM consumption use a separate surface
(redaction.md §6) that pseudonymizes / drops PRIVATE
fields; do not conflate the two.
Deterministic ordering: rows are returned in events.id (ULID)
order. Two consecutive exports of the same window produce byte-identical
output (modulo any audit events written between calls).
Response:
200 with Content-Type: application/jsonl and one event per line:
{"id":"01HZ...","timestamp":"2026-05-12T12:00:00+00:00","session_id":"sess_a","turn_id":"turn_a","parent_event_id":null,"type":"llm.call_completed","actor":"agent","sensitivity":"pseudonymous","payload":{"model":"anthropic:claude-sonnet-4-6",...,"user_id":"usr_alice"}}
{"id":"01HZ...","timestamp":"2026-05-12T12:00:00+00:00","session_id":"sess_a","turn_id":"turn_b","parent_event_id":null,"type":"turn.completed","actor":"agent","sensitivity":"pseudonymous","payload":{...,"user_id":"usr_alice"}}
Content-Disposition: attachment; filename="{user_id}.jsonl" so
curl -OJ / browser saves land on a sensible filename.X-Metis-Row-Count header with the count at request-arrival time
(computed via a cheap pre-stream SELECT COUNT(*); the stream itself
may diverge if a concurrent client writes events during the response).Audit event: after the body has been fully drained, one
analytics.user_exported event is emitted onto the bus carrying the
realized row count, byte count, and window bounds. A client that
disconnects mid-stream sees the audit event reflect what was actually
delivered, not what was requested.
Cost is Decimal-faithful. llm.call_completed events stamp
cost_usd as the string-form of a Decimal; the export passes the
JSON column through unchanged, so cost values land as JSON strings (e.g.
"0.0143") and a downstream JSONL consumer can round-trip back to
Decimal without float drift. The convention matches analytics-api.md
§5.1.
POST /analytics/user/{user_id}/forgetTriggers redaction.md’s pseudonymization flow against every event
stamped with user_id. The request body is intentionally empty — the
URL identifies the subject; there is no other input.
Path parameter: same shape guard as §4.10.1.
Algorithm:
Redactor on request.app.state.app_state.redactor,
falling back to PseudonymizingRedactor(db_path) if none is
configured. The default impl performs UPDATE events SET payload_json
= json_set(payload_json, '$.user_id', :pseudonym) WHERE
json_extract(payload_json, '$.user_id') = :user_id. redaction.md §5
defines the canonical policy (free-text scrubbing, rationale-redacted
opt-in fields, etc.) that replaces this default impl when wired.pseudonym_for(user_id)
(SHA-256 → 12-hex with redacted_ prefix).analytics.user_forgotten event onto the bus carrying the
subject, pseudonym, and rowcount.Response (200):
{
"user_id": "usr_alice",
"pseudonym": "redacted_a1b2c3d4e5f6",
"pseudonymized_rows": 42,
"completed_at": "2026-05-15T12:00:00+00:00"
}
Idempotent: a second call for the same user_id returns
pseudonymized_rows = 0 (the original id is no longer present after the
first call). The audit event fires either way so the audit trail
records every request, not just the first one.
Append-only safety: no row is deleted; only user_id is rewritten
in place. The append-only trace-store invariant
(event-bus-and-trace-catalog.md §7)
holds.
Errors:
| Code | HTTP | When |
|---|---|---|
invalid_user_id |
400 | Path parameter fails the shape guard. |
Both endpoints have local-FS CLI mirrors for admin shells:
metis analytics user-export <user_id> [--from] [--to] [--out file.jsonl] [--db-path …]metis user forget <user_id> --confirm [--db-path …]The CLI forget command refuses without --confirm (returns exit code 2
and prints the warning). The export command streams to stdout by default
so metis analytics user-export usr_alice | jq works without an
intermediate file. Both commands operate on the trace DB directly via
the same AnalyticsStore / Redactor primitives the HTTP surface uses,
and produce the same analytics.user_exported / analytics.user_forgotten
audit events (so a buyer’s compliance review sees a uniform audit trail
regardless of entry point).
Loopback-only inherits from the rest of /analytics/* (§2.1.4). The
HTTP surface trusts the local operator. Multi-user authenticated
dashboards (which would let a non-admin user request their own export)
are downstream of the the project strategy (private) fork and out of
scope for v1.
When per-key gateway auth lands on the dashboard surface (future
spec), the export endpoint will reject any caller whose principal
neither (a) matches user_id nor (b) holds an admin / workspace-owner
scope. The forget endpoint will reject any non-admin caller outright.
Until then, the CLI is the only sanctioned non-localhost surface.
This spec defines the endpoint shape; redaction.md defines the policy. The two compose:
forget’s Redactor injection point lets redaction.md ship a richer
impl (rationale-redacted opt-in fields, free-text scrubbing) without
changing the HTTP contract.export’s default is “verbatim” because the subject is the data owner;
redaction.md’s EventRedactor modes (pseudonymize, redact_private,
aggregate_only) are intended for the audit-export surface, not the
portability surface.Recapping §2.1.3 in concrete terms:
| Field on response | Source | When to display |
|---|---|---|
cost_usd on /analytics/cost |
Stamped on the llm.call_completed event |
“What I spent” views; matches the invoice. |
actual_stamped_usd on /analytics/savings |
Stamped on the events | Reconciliation with the bill. |
actual_repriced_usd, baseline_repriced_usd on /analytics/savings |
Re-computed in-handler against current PriceTable |
Counterfactual / “vs naive baseline” panel. |
Re-pricing logic lives in PriceTable.compute_cost (pricing/table.py). The handler calls it directly; no new pricing code is introduced by this spec.
current_pricing_version is opaque. PriceTable.with_overlay() produces composite version strings of the form "2026-05-08+<overlay_version>" (used by the OpenRouter adapter, which fetches rates at startup and overlays them on the base table). The SPA must treat the version field as an opaque string — for display, equality comparison, and join keys against historical stamps — and must not parse it.
JSON has no Decimal type, but Usage.cost_usd is Decimal throughout the core (canonical-message-format.md §6.4) precisely to avoid float drift on cent-level math. The convention here:
Decimal. Every handler that sums cost across rows holds the running total as Decimal until the response is serialized.format(d, '.6f'), then parsed back to a JSON number). 6 places sit ~4 orders of magnitude below cent precision at $10K spend, well under any drift the SPA could introduce; cost-by-model summed over 10K rows still rounds cleanly to cents.stamped values are passed through, not re-rounded. A stamped cost_usd is whatever the source event recorded — typically already short of 6 decimal places. The 6-place convention applies to handler-side aggregates and re-priced values; pre-existing stamps are emitted as-is.Token counts and latency_ms are integers; this convention applies only to cost fields.
Follows server-api.md §5 conventions. New error codes specific to analytics:
| Code | HTTP | When |
|---|---|---|
invalid_time_window |
400 | from > to or malformed ISO 8601. |
invalid_group_by |
400 | group_by is not in the allowed set for that endpoint. |
invalid_order |
400 | order is not in the allowed set (cost | recency). |
invalid_limit |
400 | limit is not an integer or is less than 1. |
unknown_baseline_model |
400 | Savings baseline isn’t registered in the current PriceTable. |
invalid_gateway_key |
400 | gateway_key filter value violates the ^[A-Za-z0-9_-]{1,200}$ shape guard. |
invalid_user |
400 | user filter value violates the shape guard. |
invalid_team |
400 | team filter value violates the shape guard. |
invalid_user_id |
400 | {user_id} path parameter on /analytics/user/... fails the shape guard (§4.10). |
turn_not_found |
404 | No events match the given turn_id. |
Naming follows the symmetric convention invalid_<param> for value-rejection errors and unknown_<resource> / <resource>_not_found for lookup failures, matching server-api.md §5.
events, messages, or sessions. No bus events emitted.window.start/window.end (or null/null for non-time-windowed endpoints).current_pricing_version always reflects the table active at request time; in-row stamped values are unaffected.sensitivity filtering. v1 is single-user local; the local user is the data owner. A future remote-deployment shape may need to filter private-sensitivity payloads from responses, but that’s downstream of the the project strategy (private) fork.?limit and ?since_event_id parameters — additive, non-breaking.data: [] (or zeroed totals) and the correct echoed window.llm.call_completed events with known costs; assert /analytics/cost?group_by=model sums per model.uncached=1000, cached=400, cache_creation=600: hit_rate == 0.2 (400 / 2000), cache_write_share == 0.3 (600 / 2000). A row with all-uncached input has both ratios at 0.route.decided event with winner_index=2; verify it counts under rule in the response.route.decided with winner_index=-1 and three rejected chain entries; verify hard_failures == 1, every win counter is unchanged, and the three rejections show up under rejections.route.decided with two rejected chain entries contributes two rows to rejections.[100, 200, ..., 1000] produces p50 ~= 500, p95 ~= 950.?order=cost and ?order=recency produce different orderings; the displayed updated_at matches the field used for sort.cost_usd, not cost_so_far_usd.turn_id, verify all expected event types appear in order plus user and assistant messages. Separately: fetch a turn that has turn.started but no turn.completed/turn.cancelled; verify in_flight == true and the response upper-bounds at request time.?baseline=does-not-exist → 400 unknown_baseline_model.baseline_repriced_usd = sum(token_counts × opus_rates) and actual_repriced_usd uses each row’s actual model.baseline_repriced_usd, is excluded from actual_repriced_usd, and increments rows_missing_from_price_table./analytics/cost reflects A; /analytics/savings.actual_repriced_usd reflects B.current_pricing_version matches the PriceTable’s version field at request time, including composite overlay versions (e.g. "2026-05-08+openrouter-2026-05-11").from > to → 400. Malformed ISO → 400.group_by=DROP TABLE is rejected as 400 invalid_group_by; order=; DELETE is rejected as 400 invalid_order. Neither reaches SQL.group_by. For each of model/provider/session/day/hour, data is an array with the right key columns. group_by=none returns data as a single object, not an array. Time buckets are returned in ascending order.llm.call_completed events whose stamped costs end in odd long decimal expansions; assert the aggregated response value matches the sum-of-Decimals to within 1e-9. (Catches a regression to float aggregation.)actual_stamped_usd and baseline_repriced_usd, but not actual_repriced_usd. rows_missing_from_price_table == 1.savings_usd and negative savings_pct — not zero, not absolute-valued.to) never decreases summed cost_usd.actual_repriced_usd <= baseline_repriced_usd when baseline is the most expensive model in the catalog and every actual row used a cheaper one. (Construct the fixture to make this trivially true.)Deferred from this spec; revisit when the matching evidence shows up.
json_extract slow enough that we need to populate a usage_rollups table from the bus? Tentative threshold: 100K llm.call_completed rows or sustained p95 > 1s on any endpoint.from/to. When does the UI need a date picker? When users start asking “what did I spend in March.”/sessions/{id}/stream is the obvious extension point.cost_today_exceeds_usd move to local TZ to match the dashboard, or should the dashboard expose a “UTC midnight” toggle? Wait for confusion before deciding.| Date | Decision | Rationale |
|---|---|---|
| 2026-05-12 | Read-only, projection over trace store; no rollup table | At single-user scale json_extract is plenty fast; rollups add drift + fast-path-budget concerns. |
| 2026-05-12 | Hybrid pricing: stamped for actuals, re-priced for the counterfactual | “What I spent” must reconcile with the bill; counterfactual is meaningless unless num/denom share a table. |
| 2026-05-12 | UTC at the API; SPA converts to local TZ | Keeps the server pure; flipping to custom date picker later is a frontend-only change. |
| 2026-05-12 | Loopback-only inherits; no new auth | Matches v1 server posture; multi-user/remote is downstream of the unresolved the project strategy (private) fork. |
| 2026-05-12 | Routing chain aggregation in Python, not SQL | Clearer at this scale; SQLite json_each() is the escape hatch if it ever bottlenecks. |
| 2026-05-12 | Fold /tokens into /cost |
Same source rows; separate endpoints would duplicate the SQL and the response envelope. |
| 2026-05-12 | Audience toggle lives in the SPA, not the API | Same endpoints serve both Cost and Activity views; the toggle just reorders tiles. |
| 2026-05-12 | Cache hit-rate denominator includes cache_creation_tokens |
Cache writes are billed (~1.25× input rate); excluding them inflates hit rate during cache rebuild. cache_write_share broken out separately so the cache-rebuild signal stays visible. |
| 2026-05-12 | Hard-failure route.decided events bucketed under hard_failures, not policy wins |
The engine emits exactly one route.decided per turn including on hard failure with winner_index = -1; indexing into chain[winner_index] would error or miscount. Rejections within the chain are still counted in the rejections breakdown. |
| 2026-05-12 | Sessions list sort key and displayed field are one source (updated_at) |
The earlier draft sorted on sessions.updated_at but displayed MAX(messages.created_at). Same source for both removes a possible ordering surprise and an N-subquery cost. The two fields are within microseconds of each other in practice. |
| 2026-05-12 | Whitelist-map group_by and order parameters; never interpolate |
Request strings entering SQL must be mapped to literal column names by the handler, even with no untrusted caller in v1. Keeps the surface safe by construction. |
| 2026-05-12 | Turn drill-down upper bound falls back to now() for in-flight turns |
Lets the SPA render a live turn the same way /sessions/{id}/stream does. in_flight: true in the response signals the message list may grow on re-poll. |
| 2026-05-12 | No window-size cap in v1 | Cap was speculative; json_extract over indexed (type, timestamp_us) is fast enough at single-user scale and the p95 target is the load-bearing constraint. |
| 2026-05-12 | Decimal aggregated end-to-end; serialized as JSON number with 6-place precision | Pricing is Decimal throughout the core (AGENTS.md). Float-typed aggregates over 10K rows can drift cents; 6 decimal places is well below cent precision and parses safely. |
| 2026-05-12 | Time buckets are single-dimension | group_by=day,model (multi-key) is a future enhancement, not v1. Splitting time series by model bloats the row count and the SPA renders fine without it for v1. |
| 2026-05-12 | actual_stamped_usd is unconditional across missing-model rows |
Stamped values are correct at write-time; the current PriceTable doesn’t affect them. Only re-priced fields care about the current table. |
| 2026-05-12 | Negative savings_usd / savings_pct are valid results |
“You spent more than baseline” is a real workload outcome; suppressing the sign would lie. SPA renders the case explicitly. |
canonical-message-format.md — Message, Usage, MessageMetadata, persistence schema.event-bus-and-trace-catalog.md — llm.call_completed, llm.call_failed, route.decided, turn.completed payloads.server-api.md — base HTTP surface conventions this spec extends.memory-store.md — sibling spec drafted retrospectively from existing code; shape-reference for this doc.../the project strategy (private) — buyer ≠ user framing, savings-as-the-headline thesis, unresolved replacement-agent-vs-gateway fork.../KNOWN_ISSUES.md — prompt-caching gap (5–10× left on the table); cache-effectiveness view doubles as forcing function.packages/metis-core/src/metis_core/trace/store.py — the table this spec reads from.packages/metis-core/src/metis_core/pricing/table.py — compute_cost, used directly by /analytics/savings.