Status: Draft v3.1 Last updated: 2026-05-14 Owner: your name
v3.1 changes (2026-05-14):
evaldomain added to closed list (§4.5) with three new event types —eval.started,eval.completed,eval.failed(§6.12) perevaluator.md §8. Three new pattern event types added under the existing pattern domain (§6.5b) —pattern.recorded,pattern.matched,pattern.evictedperpattern-store.md §10. All six payloads landed inevents/payloads.pyandPAYLOAD_REGISTRY. Sensitivity floor ispseudonymousfor five;eval.completed’s floor isuser_controlled(the worst case, whensignals.rationale_redactedis populated) and downgrades topseudonymouswhen the rationale field is absent — a move toward less private, which §4.4.1 explicitly allows.
v3 changes: Streaming events explicitly excluded from catalog (§4.5); streaming server removed from bus subscriber table (§5.4) — they receive events directly from the agent loop, not via the bus. Cross-reference to
EventFrame(§5.4 note). Error class enums extended inllm.call_failed(§6.3) andtool.failed(§6.4) to match adapter and dispatcher contracts.tool.confirmation_requestedandtool.confirmation_resolvedadded (§6.4).block_droppedconfirmed as log-only, not catalog event.
v2 changes:
route.decidedexactly-once preserved by introducingroute.overriddenas a distinct type (§6.5b). Pattern domain lifted into its own §6.5b.bus.gap_detecteddefined (§6.10). Bus diagnostics (bus.overflow,bus.handler_error) moved out of the catalog and into structured logs to avoid recursion and chicken-and-egg failures (§3.5, §5.2, §6.10). Redelivery claim narrowed to “across restarts, not in-process” (§5.1). Trace store committed to SQLite WAL +synchronous=NORMAL(§3.4, §7.2). Memory snapshotter moved off fast path (§5.4). Dynamic sensitivity on opt-in (§4.4). Delegation phase asymmetry documented (§6.8).bus.gap_detectedtest rewritten (§9.1). Various nits.
This document specifies the event bus — the in-process pub/sub spine that decouples the agent loop from observability, persistence, and analytics — and the closed catalog of event types that flow through it. Every meaningful action in the system emits an event; subscribers consume events; the trace store persists them.
The catalog is the contract. Adding a new event type is a deliberate spec change. Removing or renaming a type is a breaking change with a major version bump.
This spec is referenced by:
canonical-message-format.md (events that touch messages)routing-engine.md (the route.decided event and routing auxiliaries)streaming-protocol.md (forthcoming — defines client subscriptions)skill-format.md (forthcoming — events for skill load/create/modify)parent_event_id; “why did this happen?” is a walk back to the root cause.event.id if they care.class EventBus:
def emit(self, event: Event) -> None:
"""Synchronous from caller's perspective. Appends to the dispatch queue,
which is drained asynchronously. Validates payload against the type's
schema; raises EventValidationError on bad payload (dev mode) or logs
and drops (prod mode)."""
def subscribe(self, sub: Subscription) -> SubscriptionHandle:
"""Register a handler for events matching the filter. Returns a handle
used to unsubscribe."""
def unsubscribe(self, handle: SubscriptionHandle) -> None:
"""Remove a subscription. Idempotent."""
class Subscription:
filter: EventFilter
handler: Callable[[Event], Awaitable[None]] # async
name: str # for diagnostics
fast_path: bool = False # see §3.4
class SubscriptionHandle:
id: str # opaque, returned by subscribe()
subscription: Subscription # the original (for diagnostics)
class EventFilter:
session_ids: set[str] | None = None # None = all sessions
event_types: set[str] | None = None # None = all types
actors: set[Actor] | None = None # None = all actors
emit returns immediately after enqueueing. The dispatch worker drains the queue and fans out to matching subscribers. Handlers run on the asyncio event loop; slow handlers do not block other handlers or the emitter.
The bus runs in one of two modes:
EventValidationError. Tests must run in strict mode.Mode is set by environment variable. Default in development is strict; default in production is lenient.
The trace store is one subscriber, registered automatically by the server at startup with EventFilter() (all events). It writes each event to SQLite as it arrives.
The bus does not persist events itself. If the trace store subscriber is slow or crashes, events queue up in the dispatch worker’s buffer (bounded — see §3.5). Events that have been enqueued but not yet persisted are lost on hard process crash.
This is acceptable because:
If stronger guarantees are needed later (write-ahead log to disk before fanout), they can be added without changing the bus interface.
Subscribers are tagged fast_path: true or false. Fast-path subscribers run inside the dispatch worker, synchronous with each event:
synchronous=NORMAL; typically <1ms).streaming-protocol.md §5.1).Batch subscribers run on their own schedules and query the trace store directly:
memory.updated event).The convention: anything that can’t reliably finish in <1ms per event must be batch, not fast-path. Adding a slow handler as fast-path is a bug — it stalls all event processing.
The trace store’s SQLite mode is committed (see §7.2 for full details and durability trade-off). The “writes one row, sub-millisecond” claim depends on this configuration; the default synchronous=FULL mode would not meet the fast-path budget.
The dispatch queue is bounded (default 10,000 events). On overflow:
EventBusOverflowError to the caller.ERROR, with the rejected event type and current queue depth). This goes to the application log, not through the bus — the bus is, by definition, the thing that’s full.In v1 this should never happen at single-user scale. The bound is a safety net against runaway emit loops, not a backpressure-shaping mechanism.
Bus diagnostics (overflow, handler errors, gap detection) are written as structured logs rather than events. They describe the bus’s own health and would create chicken-and-egg problems if routed through the bus they describe (e.g., overflow events while overflowing). The catalog (§6) covers domain events; bus diagnostics are observability about the bus itself.
Subscribers register at server startup or session creation. Unsubscribe at shutdown. The bus does not retain references to handlers across server restarts; subscribers re-register on startup.
Every event has the same envelope:
class Event:
# Identity
id: str # ULID, monotonic per process
timestamp: datetime # microsecond precision UTC
# Scoping
session_id: str
turn_id: str | None # null for session-level and system events
# Causality
parent_event_id: str | None # the event that caused this one, if any
# Type
type: str # dotted lowercase, e.g. "llm.call_started"
actor: Actor # USER | AGENT | SYSTEM | TOOL | WORKER
# Payload
payload: dict # validated against the type's schema
# Sensitivity classification
sensitivity: Sensitivity # PRIVATE | USER_CONTROLLED | PSEUDONYMOUS | AGGREGATABLE
class Actor(StrEnum):
USER = "user"
AGENT = "agent" # the planner-role LLM
SYSTEM = "system" # the server itself (router, dispatcher, etc.)
TOOL = "tool" # tool dispatcher
WORKER = "worker" # delegated sub-agent
class Sensitivity(StrEnum):
PRIVATE = "private" # contains user prompt, file content, etc.
USER_CONTROLLED = "user_controlled" # skill bodies; user chose to make shareable
PSEUDONYMOUS = "pseudonymous" # structural metadata (file types, tags)
AGGREGATABLE = "aggregatable" # outcomes safe for cross-user aggregation
id is a ULID generated at emit time. ULIDs are sortable, monotonic per process, and globally unique.parent_event_id is a single pointer to the event that directly caused this one. Chains are reconstructed by walking pointers backward. Examples:
turn.started
└─ llm.call_started (parent: turn.started)
└─ llm.call_completed (parent: llm.call_started)
└─ tool.called (parent: llm.call_completed — assistant emitted tool_use)
└─ tool.completed (parent: tool.called)
└─ llm.call_started (parent: tool.completed — agent loop made the next call)
└─ ...
Branches in the chain (one event causing many) are represented by multiple events each pointing back to the same parent — not by a single event with multiple children. The chain is a tree of parent pointers.
Events without a parent (e.g., session.created, turn.started from user input) have parent_event_id: null.
| Class | Meaning | Examples |
|---|---|---|
private |
Contains user prompts, file content, command outputs. | turn.started, tool.completed body |
user_controlled |
User explicitly chose to share/sync this. | skill.created (skill body) |
pseudonymous |
Structural metadata, no raw user content. | routing.policy_invalid, fingerprint tags |
aggregatable |
Safe to include in cross-user aggregations with k-anonymity. | Pattern outcome rollups, feedback.explicit |
Every event type declares its default sensitivity in the catalog. The default for any new type is private — opting up to less restrictive requires deliberate design.
This classification gates the future sync layer and any future cross-user features. No code in v1 acts on it (everything stays local). But the tag is recorded on every event from day one so future features have the data.
Some event types have payload fields that are populated only when the user opts in. The most important case is turn.started.user_message_text_redacted: nullable by default, populated only if the user enabled trace sharing.
When such a field is populated, the event’s recorded sensitivity may upgrade to a less-restrictive class than the catalog default. The rule:
sensitivity value is computed at emit time based on which optional fields are populated.private to user_controlled to pseudonymous to aggregatable) — never toward more private than the floor. make_event enforces this: a sensitivity override more private than the catalog floor raises EventValidationError.Concrete examples:
turn.started floor is private. A turn.started event with user_message_text_redacted: null is recorded as the floor private. The same event type with the field populated (because the user opted into trace sharing) is recorded as user_controlled — a downgrade toward less private, which the rule allows.eval.completed floor is user_controlled. With signals.rationale_redacted populated, recorded as the floor user_controlled. With the rationale field absent (heuristic verdict, or LLM verdict without rationale opt-in), the subscriber passes pseudonymous — again a downgrade toward less private.This keeps the catalog contract honest: the floor is the most-private possible recording for that event type, and the actual sensitivity tag reflects what was actually included.
Convention: <domain>.<verb_phrase> in lowercase, dot-separated.
session.created
turn.started
llm.call_started
tool.called
route.decided
skill.loaded
delegate.started
memory.updated
feedback.explicit
Domains: session, turn, llm, tool, route, skill, memory, delegate, feedback, bus, pattern, provider, eval. Closed list. Adding a domain is a spec change.
The streaming protocol (streaming-protocol.md §5.3) defines a separate family of transient event types for live UI updates. Their domains are message, text, thinking, plus the tool.use_* sub-namespace (tool.use_start, tool.use_input_delta, tool.use_end).
These are not bus catalog events:
Message content and the usage totals on llm.call_completed.The names message.*, text.*, thinking.*, tool.use_* are reserved for streaming use; bus catalog events MUST NOT use these prefixes. The tool domain’s bus events (tool.called, tool.completed, tool.failed, tool.input_invalid, tool.confirmation_*) are distinct from the streaming tool.use_* events — different verbs, no collision.
async def handler(event: Event) -> None:
...
Handlers MUST:
fast_path: true; otherwise the handler should be fast_path: false and operate from the trace store.Handlers SHOULD:
event.id.In-process redelivery does not happen in v1. The dispatcher fans out each event once per matching subscriber, then moves on. A handler raising means the event is dropped for that subscriber and that’s it. Exactly-once across restarts would require per-subscriber acknowledgment tracking, which v1 doesn’t implement; it’s a Phase 3+ concern if it becomes necessary.
If a handler raises:
WARN with the subscription name, failed event id, event type, and the exception. This goes to the application log, not through the bus (see §3.5 rationale on bus diagnostics).This avoids the recursive-failure problem where a subscriber that filters on all events would receive its own error notifications and fail again, generating more error notifications, indefinitely.
Replay (used by streaming protocol on client reconnect) is not a bus operation. It’s a trace store query: SELECT * FROM events WHERE session_id = ? AND id > ? ORDER BY id. The bus only fans out live events; historical replay queries the persistent layer directly.
Registered automatically at server startup:
| Subscriber | Filter | Fast path | Purpose |
|---|---|---|---|
| Trace store writer | All events | Yes | Append to SQLite (WAL mode; see §7.2). |
| Streaming bus bridge | Per-client filter | Yes | Forward bus events to attached WebSocket clients (wraps each event in an EventFrame, see streaming-protocol.md §4.2). |
| Cost accumulator | llm.call_completed |
Yes | Update running session cost. |
| Pattern outcome | session.ended |
No | Compute fingerprint + outcome. |
| Memory snapshotter | memory.updated |
No | Capture before/after for diff display via batch read of the memory file. |
Additional subscribers may be registered by analytic plugins in Phase 3+.
The streaming server has two input channels: the bus bridge (above) for catalog events, and a direct channel from the agent loop for streaming-only events (per §4.5.1). On the wire, both are wrapped in EventFrame and sent to clients in a single ordered stream. Clients see one merged stream and don’t need to know about the internal split.
This section enumerates every event type, its payload, its sensitivity, its parent type (the typical event it descends from), and the phase it ships in.
For each event:
Type: dotted name Sensitivity: classification Phase: which project phase first emits this Actor: typical emitter Parent: typical parent event type Payload:
session.createdSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent: none
{
"workspace_path": str, # absolute, ~ expanded
"workspace_hash": str, # SHA-256 of workspace_path, for joining without exposing path
"initial_active_model": str | None,
"routing_policy_version": str, # SHA-256 of routing.yaml contents at session start
# (not mtime — restore-from-backup can reuse mtimes
# across distinct files; content hash is unambiguous)
}
session.resumedSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent: none
{
"workspace_hash": str,
"last_event_id_at_resume": str | None, # for replay
}
session.endedSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent: none
{
"disposition": Literal["completed", "abandoned", "error"],
"turn_count": int,
"total_cost_usd": float,
"duration_seconds": float,
}
abandoned is emitted on a configurable inactivity timeout (default 24h). error is emitted on unrecoverable session failure.
turn.startedSensitivity:
privatePhase: 1 Actor: USER Parent: none
{
"user_message_hash": str, # SHA-256 of message text, for dedup detection
"user_message_text_redacted": str | None, # populated only if user opted into trace sharing
"estimated_input_tokens": int,
"has_images": bool,
"has_tool_calls_in_history": bool,
}
The full user message text is not in the event payload — it’s persisted as part of the canonical Message in the session store. The event carries metadata sufficient for routing and analytics without duplicating content.
turn.completedSensitivity:
pseudonymousPhase: 1 Actor: AGENT Parent:turn.started
{
"stop_reason": Literal["end_turn", "max_tokens", "stop_sequence", "tool_use"],
"llm_call_count": int,
"tool_call_count": int,
"total_input_tokens": int,
"total_output_tokens": int,
"total_cost_usd": float,
"wall_time_seconds": float,
# --- additive (default null; existing consumers ignore unknown fields) ---
"signals_extra": dict | None, # evaluator §5.1 supplementary keys (e.g. final_response_text)
"user_id": str | None, # multi-user.md §4.4; null on agent-loop and pre-multi-user keys
"team_id": str | None, # multi-user.md §4.4; same null convention as user_id
}
user_id and team_id are stable pseudonymous identifiers (usr_<ulid> / team_<ulid>) resolved from the gateway key at request entry per multi-user.md §3 and §4.4. They roll up under the null bucket in the analytics surface for agent-loop traffic and for keys issued before the multi-user fields were added. Plaintext PII (email, real name) lives in users.json only — the trace store carries the stable id only (multi-user.md §3.3).
turn.cancelledSensitivity:
pseudonymousPhase: 1 Actor: USER Parent:turn.started
{
"reason": Literal["user_cancel", "client_disconnect", "timeout"],
"partial_llm_calls": int,
"partial_tool_calls": int,
}
llm.call_startedSensitivity:
privatePhase: 1 Actor: AGENT Parent:turn.started(first call) ortool.completed(subsequent calls)
{
"model": str, # canonical "provider:name"
"provider": str,
"estimated_input_tokens": int,
"request_id": str, # adapter-issued, for cross-referencing logs
"is_worker": bool, # true if this is inside a delegated worker session
}
llm.call_completedSensitivity:
pseudonymousPhase: 1 Actor: AGENT Parent:llm.call_started
{
"model": str,
"provider": str,
"input_tokens": int,
"output_tokens": int,
"cached_input_tokens": int,
"cache_creation_input_tokens": int,
"cost_usd": float,
"pricing_version": str,
"latency_ms": int,
"stop_reason": Literal["end_turn", "max_tokens", "stop_sequence", "tool_use"],
"produced_tool_calls": int, # number of tool_use blocks in the response
"produced_thinking_blocks": int,
# --- additive (default null; existing consumers ignore unknown fields) ---
"gateway_key_id": str | None, # gateway.md §6; null on agent-loop traffic
"inbound_shape": Literal["openai", "anthropic"] | None, # gateway.md §6
"user_id": str | None, # multi-user.md §4.4; null on agent-loop and pre-multi-user keys
"team_id": str | None, # multi-user.md §4.4; same null convention as user_id
}
gateway_key_id and inbound_shape are stamped at the gateway boundary per gateway.md §6; both are null when the call originated from the in-process agent loop (CLI / TUI / metis serve). user_id and team_id are stable pseudonymous identifiers (usr_<ulid> / team_<ulid>) resolved from the gateway key per multi-user.md §3 and §4.4 — they roll up under the null bucket for agent-loop traffic and for keys issued before the multi-user fields were added. Plaintext PII (email, real name) lives in users.json only — the trace store carries the stable id only (multi-user.md §3.3).
llm.call_failedSensitivity:
pseudonymousPhase: 1 Actor: AGENT Parent:llm.call_started
{
"model": str,
"provider": str,
"error_class": str, # see ErrorClass enum in provider-adapter-contract.md §6.1:
# "rate_limit" | "auth" | "server_error" | "network"
# | "context_overflow" | "invalid_request" | "cancelled" | "other"
"error_message_redacted": str, # provider message with PII heuristically scrubbed
"retry_count": int, # how many retries the adapter attempted
"latency_ms": int,
}
tool.calledSensitivity:
privatePhase: 1 Actor: AGENT Parent:llm.call_completed
{
"tool_use_id": str, # canonical id (tu_<ulid>)
"tool_name": str, # canonical name
"input_hash": str, # SHA-256 of canonical input JSON
"input_size_bytes": int,
"side_effects": Literal["none", "read", "write", "execute", "network"],
}
Input content is in the canonical ToolUseBlock, not in the event. Hash and size let us detect duplicate calls without storing the input twice.
tool.completedSensitivity:
privatePhase: 1 Actor: TOOL Parent:tool.called
{
"tool_use_id": str,
"success": bool,
"output_size_bytes": int,
"latency_ms": int,
"files_modified": list[str] | None, # for write tools; null for others
"command_executed": str | None, # for execute tools; null for others
}
For execute and write tools, side-effect details are recorded (file paths, command strings) for audit. The actual command output is in the canonical ToolResultBlock.
tool.failedSensitivity:
privatePhase: 1 Actor: TOOL Parent:tool.called
{
"tool_use_id": str,
"error_class": Literal["timeout", "permission_denied", "not_found", "validation_error",
"execution_error", "cancelled", "user_denied", "confirmation_timeout"],
"error_message": str,
"latency_ms": int,
}
tool.input_invalidSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent:llm.call_completed
{
"tool_name": str,
"validation_errors": list[str],
}
Emitted when a tool_use block’s input fails JSON Schema validation against the tool’s schema. The agent loop returns an error tool_result to the model.
tool.confirmation_requestedSensitivity:
privatePhase: 1 Actor: SYSTEM Parent:tool.called(logically; the tool call is paused waiting for user response)
Emitted when a tool with WRITE/EXECUTE/NETWORK side effects requires user confirmation per tool-dispatcher.md §5.2. The tool’s execution is paused until a tool.confirmation_resolved event is emitted (or the confirmation times out).
{
"tool_use_id": str,
"tool_name": str,
"side_effects": Literal["write", "execute", "network"],
"confirmation_request_id": str, # ULID; used by the response endpoint
"input_summary": str, # human-readable, redacted of long content
"projected_modifications": list[str] | None, # for WRITE: paths to be modified
"command_summary": str | None, # for EXECUTE: the command line, possibly truncated
"expires_at": datetime, # when the confirmation request times out
}
The streaming server forwards this event to all attached clients of the session; clients render a UI prompt. The user’s response goes through HTTP per server-api.md §4.2.
tool.confirmation_resolvedSensitivity:
privatePhase: 1 Actor: USER Parent:tool.confirmation_requested
{
"tool_use_id": str,
"confirmation_request_id": str,
"decision": Literal["allow", "deny", "timeout"],
"scope": Literal["once", "session"] | None, # null if decision is "timeout"
"responding_client_attach_token": str | None, # which client answered, if multiple attached
}
The dispatcher proceeds to execute (allow) or aborts (deny, timeout) based on the decision.
These are defined in detail in routing-engine.md §7. Repeated here with full payloads:
route.decidedSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent:turn.started
Exactly one route.decided event per turn (per routing-engine spec §7.2). User-driven overrides after the fact emit a separate route.overridden event under the Pattern domain (§6.5b).
{
"chosen_model": str,
"winner_index": int,
"elapsed_ms": float,
"chain": [
{
"policy": Literal["per_message_override", "manual_sticky", "rule",
"pattern", "delegate_request", "workspace_default", "global_default"],
"verdict": Literal["not_applicable", "deferred", "rejected", "chose"],
"candidate_model": str | None,
"reason": str,
"rule_name": str | None,
"confidence": float | None,
"pattern_alternatives": list[{"model": str, "score": float, "sample_size": int}] | None,
"validation_failure": Literal["no_vision_support", "exceeds_context_window",
"no_tool_support", "no_system_prompt_support",
"no_structured_output_support",
"provider_unavailable", "not_configured"] | None,
},
# ... one per policy in chain order
]
}
routing.policy_invalidSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent: none
{
"policy_path": str,
"errors": list[str],
"using_last_known_good": bool,
}
routing.provider_unavailableSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent:llm.call_failed(typically the failure that crossed the threshold)
{
"provider": str,
"scope": Literal["model_specific", "provider_wide"],
"models_affected": list[str],
"trigger_reason": str, # "5_consecutive_failures" | "auth_error"
# | "dns_error" | "multi_model_failures"
}
routing.provider_recoveredSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent: none
{
"provider": str,
"scope": Literal["model_specific", "provider_wide"],
"models_recovered": list[str],
"downtime_seconds": float,
}
Pattern-domain events describe user actions on routing recommendations from the pattern store. They are distinct from route.decided (which describes the routing computation itself) — they describe what happened after the decision was surfaced to the user.
route.overriddenSensitivity:
pseudonymousPhase: 3 Actor: USER Parent:route.decided(the decision being overridden)
Emitted when the user runs /route override to apply a pattern recommendation that was deferred behind a rule. The original route.decided event remains intact (preserving history); this event records the swap.
{
"original_chosen_model": str, # what route.decided picked
"new_chosen_model": str, # what the user chose to use instead
"deferred_policy": str, # the policy that originally would have produced new_chosen_model
# (typically "pattern")
"rule_name": str | None, # the rule that won the original route.decided, if any
"pattern_confidence": float, # the confidence of the pattern recommendation
}
The session manager re-dispatches the turn to new_chosen_model after this event is emitted. Subsequent llm.call_started etc. carry the new model.
pattern.override_dismissedSensitivity:
pseudonymousPhase: 3 Actor: USER Parent:route.decided
Emitted when the user runs /route ignore (or otherwise dismisses a pattern-disagreement suggestion).
{
"chosen_model": str, # what route.decided picked (and is keeping)
"dismissed_pattern_model": str,
"rule_name": str | None,
"pattern_confidence": float,
}
The session continues with the original routing decision; this event is purely informational (and feeds back into pattern learning to track which suggestions get dismissed).
pattern.recordedSensitivity:
pseudonymousPhase: 2.5 Actor: SYSTEM Parent:session.ended
Emitted by the session-ended batch subscriber after computing the session’s contributing fingerprints + outcomes and calling PatternStore.record() for each. One event per (fingerprint, primary_model) write, not one per session. See pattern-store.md §10.1.
{
"fingerprint_id": str, # ULID
"fingerprint_kind": Literal["structural", "hybrid"],
"primary_model": str,
"sample_size_before": int,
"sample_size_after": int,
"was_new_fingerprint": bool,
"success_score": float | None, # this session's score (None if evaluator didn't run)
"cost_usd_at_record": str, # Decimal serialized as string; this session's contribution
"pricing_version": str,
"over_soft_cap": bool,
}
Field-name note: cost_usd_at_record (not cost_usd) — disambiguates from llm.call_completed.cost_usd and follows the Decimal serialization convention from canonical-message-format.md §6.4. The pattern-store draft (§10.1) currently names this field cost_usd; reconcile in the Wave 4 sweep.
pattern.matchedSensitivity:
pseudonymousPhase: 2.5 Actor: SYSTEM Parent:route.decided
Emitted when the routing engine’s slot 4 wins (the pattern policy chose the model used for the turn). Distinct from route.decided, so consumers can query “how often does pattern routing fire?” without a JSON scan over route.decided.chain. Not emitted when the pattern policy deferred — the deferred recommendation is already captured in route.decided.chain[].verdict = "deferred". See pattern-store.md §10.2.
{
"fingerprint_id": str,
"fingerprint_kind": Literal["structural", "hybrid"],
"chosen_model": str, # mirrors route.decided.chosen_model
"confidence": float,
"sample_size": int, # neighbors backing chosen_model
"k_cluster_size": int, # total neighbors found (≤ K)
"alternatives_count": int, # how many distinct models scored
}
pattern.evictedSensitivity:
pseudonymousPhase: 2.5 Actor: SYSTEM Parent:pattern.recorded(cap-triggered) or none (manual / scheduled trim)
Mirrors memory.eviction. Fired when (1) a write lands the store over soft_cap_rows (signal only; entries_evicted may be 0), (2) a write lands the store over hard_cap_rows and auto-evict removed rows, (3) the continuous age-trim removed stale rows, or (4) the operator ran /patterns clear. Counts and ages only; no content. See pattern-store.md §10.3.
{
"trigger": Literal["soft_cap_signal", "hard_cap_evict", "age_trim", "manual_clear"],
"fingerprints_before": int,
"fingerprints_after": int,
"outcomes_before": int,
"outcomes_after": int,
"entries_evicted": int, # outcomes removed; 0 for soft_cap_signal
"oldest_evicted_age_days": float | None, # for age_trim and hard_cap_evict
}
skill.loadedSensitivity:
pseudonymousPhase: 2 Actor: SYSTEM Parent:session.started(pre-activation,load_reason="always") orllm.call_completed(on-demand load viaskill_loadtool)
{
"skill_id": str,
"skill_version": str,
"load_reason": Literal["always", "on_demand", "auto_suggested"],
"load_size_tokens": int,
"source": Literal["global", "workspace"], # which directory served the skill (additive 2026-05-12)
"triggered_by_tool_use_id": str | None, # for on_demand loads via load_skill tool
}
load_reason semantics (context-assembler.md v3 §5.2.1, wired
2026-05-14):
"always" — pre-activation. Emitted by SessionManager.create_session
for every skill body inlined into the stable system prefix as v2
§5.1 padding. triggered_by_tool_use_id is None; turn_id is
None (pre-activation stands outside any turn)."on_demand" — explicit activation. Emitted by SkillLoadTool
on a successful skill_load(name) call when the skill is not
already pre-activated and not already explicitly activated in this
session. triggered_by_tool_use_id is the ToolUseBlock.id."auto_suggested" — reserved for a later description-match-driven
activation mechanism; not emitted in v3.No skill.unloaded / skill.evicted event exists. v3 defers
mid-session eviction (context-assembler.md v3 §5.2.5); budget
exhaustion surfaces via tool.failed (§6.5) per §5.2.6.
skill.createdSensitivity:
user_controlledPhase: 2.5 Actor: SYSTEM | USER Parent:session.ended(auto-generation) or none (manual)
{
"skill_id": str,
"source": Literal["manual", "auto_generated", "imported"],
"source_session_id": str | None,
"size_tokens": int,
"security_scan_result": Literal["clean", "warning", "blocked"] | None,
"security_scan_findings": list[str],
}
skill.modifiedSensitivity:
user_controlledPhase: 2 Actor: SYSTEM | USER Parent: varies
{
"skill_id": str,
"modification_type": Literal["edit", "version_bump", "rename"],
"before_hash": str,
"after_hash": str,
"diff_size_bytes": int,
"reason": str,
}
skill.searchSensitivity:
privatePhase: 2 Actor: AGENT Parent:llm.call_completed
{
"query": str, # the agent's search query
"results_count": int,
"result_skill_ids": list[str],
}
memory.updatedSensitivity:
privatePhase: 2 Actor: AGENT Parent:llm.call_completed
{
"file": Literal["MEMORY.md", "USER.md"],
"operation": Literal["add", "replace", "consolidate"],
"before_hash": str,
"after_hash": str,
"before_size_bytes": int,
"after_size_bytes": int,
}
memory.evictionSensitivity:
privatePhase: 2 Actor: SYSTEM Parent:memory.updated
{
"file": Literal["MEMORY.md", "USER.md"],
"trigger": Literal["size_cap_exceeded", "manual"],
"entries_evicted": int,
"size_before_bytes": int,
"size_after_bytes": int,
}
Status: v1 MVP shipped (Wave 10). The
delegate()built-in tool, the worker-session lifecycle, and the three event types below are wired and tested. Streaming, cancellation cascade, recursive delegation, and structured-output schema validation remain deferred perdelegation.md §2.2. The routing chain’sdelegate_requestpolicy slot has existed since Phase 1 and continues to reportnot_applicablefor top-level sessions; it now reportschose: <tier model>inside worker re-entry perdelegation.md §7.
delegate.startedSensitivity:
pseudonymousPhase: 4 (v1 MVP — Wave 10) Actor: SYSTEM Parent:llm.call_completed(the planner call that emitted thedelegate()tool_use)
{
"tool_use_id": str, # the delegate() call's tool_use_id
"worker_session_id": str,
"tier": Literal["fast", "balanced", "deep"],
"resolved_model": str,
"context_mode": Literal["minimal", "explicit"],
"context_reference_count": int, # for explicit mode
"task_size_tokens": int,
"allowed_tool_count": int, # tools the planner asked the worker to keep
"dropped_tools": list[str], # tools the planner asked for but worker forbids (§5.6)
}
delegate.completedSensitivity:
pseudonymousPhase: 4 (v1 MVP — Wave 10) Actor: SYSTEM Parent:delegate.started
The cost summary is derived — analytics joins worker spend back via
llm.call_completed.parent_session_id (delegation.md §8.3).
{
"tool_use_id": str,
"worker_session_id": str,
"success": bool,
"output_size_bytes": int,
"worker_total_cost_usd": Decimal, # serialized as string per §6.4 convention
"pricing_version": str,
"turn_count": int,
"llm_call_count": int,
"tool_call_count": int,
"wall_time_seconds": float,
"model": str, # the resolved worker model
}
delegate.failedSensitivity:
pseudonymousPhase: 4 (v1 MVP — Wave 10) Actor: SYSTEM Parent:delegate.started(when the worker session was created) or the planner’sllm.call_completed(when failure precedes session creation, e.g.no_model_available_for_tier)
{
"tool_use_id": str,
"worker_session_id": str | None, # None when failure precedes session creation
"failure_mode": Literal["worker_error", "max_tokens_exceeded", "insufficient_context",
"output_schema_validation_failed", "no_model_available_for_tier",
"cancelled_by_user"],
"error_message": str,
"worker_total_cost_usd": Decimal, # partial spend before failure; serialized as string
"pricing_version": str,
}
insufficient_context_request (typed InsufficientContextRequest from
routing-engine.md §6.6.1) is reserved for the streaming-back-to-planner
follow-up phase; v1 carries the structured ask only in error_message text.
feedback.explicitSensitivity:
aggregatablePhase: 2 Actor: USER Parent:turn.completedorsession.ended
{
"scope": Literal["turn", "session"],
"rating": Literal["thumbs_up", "thumbs_down"],
"comment": str | None,
"subject_turn_id": str | None,
"subject_session_id": str | None,
}
feedback.implicitSensitivity:
pseudonymousPhase: 2 Actor: SYSTEM Parent: varies
{
"type": Literal["retry", "manual_swap", "edit_followup", "abandon", "accept"],
"confidence": float, # 0..1, system's confidence this signal is meaningful
"subject_turn_id": str | None,
"context": dict, # type-specific extras
}
retry is detected when a user message has high similarity to a recent prior user message in the same session. manual_swap is when the user runs /model after an unsatisfactory turn. edit_followup is when a user message starts with patterns like “no, actually…” or “that’s wrong, …”. These are heuristic; confidence reflects that.
This section is shorter than v1. bus.handler_error and bus.overflow were originally event types here. They’ve been moved to structured logs (see §3.5 and §5.2 rationale) — they describe bus health, and routing them through the bus they describe creates chicken-and-egg failures and recursive amplification.
What remains as actual events: subscriber lifecycle (helpful for debugging “did my subscriber actually attach?”) and gap detection (helpful for trace store consistency checks).
bus.subscriber_registeredSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent: none
{
"subscription_name": str,
"filter": dict,
"fast_path": bool,
}
bus.subscriber_unregisteredSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent: none
{
"subscription_name": str,
"reason": Literal["explicit", "client_disconnect", "shutdown", "removed_after_errors"],
}
bus.gap_detectedSensitivity:
pseudonymousPhase: 1 Actor: SYSTEM Parent: none
Emitted on server startup when the trace store detects a gap in the per-session monotonic event-id sequence. This indicates events were emitted but not persisted (typically due to the trace store crashing while the dispatch worker was buffered).
The gap itself is not recoverable — those events are lost. The event documents the gap so consumers (replay, analytics) can flag affected sessions.
{
"session_id": str,
"gap_start_id": str, # last persisted event id before the gap
"gap_end_id": str, # first persisted event id after the gap
"estimated_missing_count": int, # ULID arithmetic estimate; not exact
"detected_at": datetime,
}
Bus diagnostics that go to logs only (not events):
EventBusOverflowErrorrejections (§3.5), handler exceptions (§5.2). Reasons are detailed in those sections.
provider.degradedSensitivity:
pseudonymousPhase: 2 Actor: SYSTEM Parent:llm.call_failed
{
"provider": str,
"recent_failure_count": int,
"window_seconds": int,
}
Distinguished from routing.provider_unavailable: degraded is a soft state (Phase 2 refinement); unavailable is the hard state that causes routing to reject.
The evaluator (evaluator.md) emits one verdict per scored subject. Subjects are turns, tool cycles, sessions, and benchmark workloads. Verdicts are append-only — re-evaluating an older subject produces a new eval.completed event with a fresh eval_id; the prior verdict is preserved. The eval domain is closed (see §4.5).
eval.startedSensitivity:
pseudonymousPhase: 3 Actor: SYSTEM Parent:turn.completed/tool.completed/tool.failed/session.ended/feedback.explicit
Emitted when the evaluator begins scoring a subject. Pairs 1:1 with a later eval.completed or eval.failed carrying the same eval_id. See evaluator.md §8.1.
{
"eval_id": str, # monotonic ULID
"subject_kind": Literal["turn", "tool_cycle", "session", "workload"],
"subject_id": str,
"rubric_id": str,
"rubric_version": str,
"judge_kind_planned": Literal["heuristic", "llm", "hybrid"],
"trigger": Literal["bus", "batch", "feedback_arrived", "benchmark"],
}
eval.completedSensitivity:
user_controlled(floor; downgrades topseudonymousper §4.4.1 whensignals.rationale_redactedis absent) Phase: 3 Actor: SYSTEM Parent:eval.started
{
"eval_id": str,
"subject_kind": Literal["turn", "tool_cycle", "session", "workload"],
"subject_id": str,
"score": float, # in [0.0, 1.0]; 1.0 = clear success
"confidence": float, # in [0.0, 1.0]; judge's confidence in `score`
"judge_kind": Literal["heuristic", "llm", "hybrid"],
"judge_model": str | None, # canonical id when llm/hybrid used the LLM tier
"judge_cost_usd": str, # Decimal serialized as string (same as Usage.cost_usd per canonical-format §6.4)
"judge_pricing_version": str | None, # set when judge_cost_usd > 0
"judge_latency_ms": int,
"rubric_id": str,
"rubric_version": str,
"signals": dict, # judge-specific evidence; see evaluator.md §4.4
"parent_eval_id": str | None, # for tool_cycle→turn / turn→session rollups
}
judge_cost_usd is Decimal("0") for heuristic verdicts and judge_pricing_version is None in that case — pricing semantics don’t apply to code that did no inference.
Sensitivity floor. The catalog floor is user_controlled — the worst case, when signals.rationale_redacted is populated and the event carries LLM-generated text the user opted into capturing. When the rationale field is absent (heuristic verdicts, opt-in disabled), the emitter passes Sensitivity.PSEUDONYMOUS to make_event — a move toward less private, which §4.4.1 allows.
eval.failedSensitivity:
pseudonymousPhase: 3 Actor: SYSTEM Parent:eval.started
Emitted instead of eval.completed when the judge couldn’t produce a verdict. See evaluator.md §8.3.
{
"eval_id": str,
"subject_kind": Literal["turn", "tool_cycle", "session", "workload"],
"subject_id": str,
"failure_mode": Literal[
"judge_output_invalid", # LLM response didn't parse against the rubric schema
"judge_call_failed", # LLM call hit a hard error (provider down, auth, etc.)
"throttled_no_heuristic", # caps fired AND heuristic also unavailable (defensive; v1 unreachable)
"subject_not_found", # subject_id resolved to no events
"rubric_invalid", # rubric file failed to load
],
"error_message": str,
"judge_latency_ms": int,
}
The gateway admin operations (metis gateway issue-key / revoke-key /
rotate-key) emit one audit event per successful keystore mutation.
All three fire from the CLI, not from a running gateway process — the
session_id is the bus-lifecycle sentinel ("system") and turn_id
is null. See gateway.md §11 for the operator-facing contract; this
section is the catalog entry.
Audit emission is best-effort — failures don’t roll back the keystore mutation. The keystore JSON file is the durable record; these events are the operator’s trail.
gateway.key_issuedSensitivity:
pseudonymousPhase: 3 (Wave 10) Actor: SYSTEM Parent: none
Emitted by metis gateway issue-key after the keystore is written.
The plaintext token is never on the bus — the event only references
the stable gateway_key_id. Identity tags follow the multi-user.md
§3.4 null-bucket convention; pre-multi-user issuance leaves both
user_id and team_id as null.
{
"gateway_key_id": str,
"name": str,
"workspace_path": str,
"issued_at": datetime,
"user_id": str | None,
"team_id": str | None,
"allowed_models": list[str] | None,
"daily_cap_usd": str | None, # Decimal-as-string when set
"monthly_cap_usd": str | None, # Decimal-as-string when set
}
gateway.key_revokedSensitivity:
pseudonymousPhase: 3 (Wave 10) Actor: SYSTEM Parent: none (one-shot CLI op)
Emitted on explicit metis gateway revoke-key invocation
(reason="admin_revoke") or when the next admin sweep persists a
grace-period lapse (reason="grace_period_expired"). The third enum
value "rotated" is reserved for a future “fail-fast revoke on
rotate” variant; it is not emitted in v1.
{
"gateway_key_id": str,
"revoked_at": datetime,
"reason": Literal["admin_revoke", "grace_period_expired", "rotated"],
}
gateway.key_rotatedSensitivity:
pseudonymousPhase: 3 (Wave 10) Actor: SYSTEM Parent: none
Emitted by metis gateway rotate-key. Carries both predecessor and
successor ids so dashboards can chart the migration; also stamps the
inherited identity dimensions so the rotation surfaces in
/analytics/by_key next to the pre-rotation rows.
{
"old_gateway_key_id": str,
"new_gateway_key_id": str,
"grace_period_until": datetime,
"workspace_path": str,
"user_id": str | None,
"team_id": str | None,
}
gateway.auth_failedSensitivity:
pseudonymousPhase: 3 (Wave 14a) Actor: SYSTEM Parent: none (pre-routing rejection) Audit-flagged: true (preserved for brute-force / credential-stuffing forensics)
Emitted at the gateway’s auth gate when an inbound request is rejected
before reaching routing / adapters. Drives both
metis_gateway_auth_failures_total{reason} (observability.md §3.2) and
gives compliance / SIEM ingest a row per failed authentication. The
payload deliberately omits the raw bearer token — only the
SHA-256-hash prefix (8 hex chars) is persisted so operators can
correlate repeated attempts of the same leaked credential without
persisting the credential itself. gateway_key_id is set only on the
key_revoked reason path (the token matched a known but inactive key).
{
"reason": Literal["missing_token", "invalid_token", "key_revoked"],
"inbound_shape": Literal["openai", "anthropic"],
"token_hash_prefix": str | None, # SHA-256 first 8 hex chars
"gateway_key_id": str | None, # set on reason="key_revoked"
}
trace.sweptSensitivity:
pseudonymousPhase: 3 (Wave 12) Actor: SYSTEM Parent: none (one-shot operator action) Audit-flagged: true (preserved by future sweeps; see §7.3)
Emitted once per metis trace prune invocation (or per CronJob
firing) after the DELETE statement returns. Reports the cutoff that
was used, how many rows were removed, how many audit-flagged rows were
exempted, and the oldest timestamp still in the DB so dashboards can
chart “retention floor over time.” session_id is the bus-lifecycle
sentinel ("system"); turn_id is null.
The event is itself audit-flagged in PAYLOAD_REGISTRY, so the
sweep-history audit trail survives subsequent sweeps. See
trace-retention.md for the operator contract and the retention
sweep mechanics.
{
"rows_deleted": int,
"rows_audit_exempt": int,
"cutoff_timestamp": datetime,
"oldest_kept_timestamp": datetime | None, # None if DB is empty after sweep
"dry_run": bool,
"swept_at": datetime,
}
In dry_run=True mode, the event is not emitted — only the
in-memory PurgeResult is returned to the caller. Operators who want
to record dry-run activity can grep CLI stdout or wire their own
audit-trail outside the bus.
CREATE TABLE events (
id TEXT PRIMARY KEY,
timestamp_us INTEGER NOT NULL, -- unix microseconds
session_id TEXT NOT NULL,
turn_id TEXT, -- nullable
parent_event_id TEXT, -- nullable
type TEXT NOT NULL,
actor TEXT NOT NULL,
sensitivity TEXT NOT NULL,
payload_json TEXT NOT NULL,
FOREIGN KEY (session_id) REFERENCES sessions(id)
);
CREATE INDEX idx_events_session_id ON events(session_id, id);
CREATE INDEX idx_events_type_timestamp ON events(type, timestamp_us);
CREATE INDEX idx_events_turn ON events(turn_id);
CREATE INDEX idx_events_parent ON events(parent_event_id);
The sessions table is defined in canonical-message-format.md §9.1 (sessions, messages, tool_calls). The events table shares the same SQLite database file in v1.
timestamp_us is microseconds for wall-clock accuracy. ULIDs already enforce per-process monotonic ordering within a millisecond (the random component increments on tie); microseconds in a separate column improve human-readable timing precision in dashboards and analytics, not ordering. Ordering is established by id.SQLite mode commitment. The events database is opened with journal_mode=WAL and synchronous=NORMAL. This is necessary for the trace store writer to meet its <1ms fast-path budget — synchronous=FULL (the SQLite default) makes single-row inserts 5–20ms due to fsync.
Durability trade-off: in WAL + NORMAL, events from the last fsync window may be lost on hard process crash or OS crash (typically <1s of events). On graceful shutdown, all events flush. This trade-off is acceptable because: (1) events are not the system of record for any user-visible state — sessions and messages are stored separately and atomically; (2) clients can reconnect with replay if they were mid-stream; (3) the dispatch worker batches WAL commits opportunistically, reducing the practical loss window.
Higher durability is available later: switching to synchronous=FULL makes inserts safe at the cost of fast-path budget. Phase 3 may optionally add batched-write durability (group commits every 100ms) which combines durability and throughput at the cost of replay-window latency.
(session_id, id) covers session replay. (type, timestamp_us) covers cross-session analytics (“how many tool.failed events this week?”). (turn_id) covers turn detail. (parent_event_id) covers causal walk.Wave 12 lands time-based retention with a single global retention_days
knob (default 90). Full contract in trace-retention.md;
the points relevant to this spec:
metis trace prune invoked by an
operator or a Kubernetes CronJob — never an in-process background
task in the gateway/server hot path.DELETE WHERE timestamp_us < ?
AND type NOT IN (<audit_types>), riding the
idx_events_timestamp_us index added in trace-retention.md §4.PAYLOAD_REGISTRY entry is
flagged audit=True (Wave 12a-1) are never deleted, regardless of
age. This is the bridge between the per-event sensitivity floor
(§4.4) and the compliance posture in the project strategy (private) — sensitivity
controls how an event can be projected; the audit flag controls
when it can be deleted.trace.swept records every sweep (§6.14) with row counts, the
cutoff, and the oldest kept timestamp. The event is itself
audit-flagged so the sweep history survives subsequent sweeps.by_type map) is deferred. v1 ships a single global cutoff;
per-type overrides revisit only if an operator names a specific
compliance need.Pattern-store outcomes are unaffected — PatternStore already has its
own bounded eviction (pattern-store.md §6) and writes per-row
aggregates that survive trace-event pruning. Session-message retention
is a separate concern owned by SqliteSessionStore and is not touched
by trace sweeps.
In v1, no virtual columns extracted from payload JSON. If specific queries get slow, add them via ALTER TABLE events ADD COLUMN ... AS (json_extract(...)) VIRTUAL. This is a non-breaking change.
Likely candidates for Phase 2: payload_model (extract from llm.call_completed for cost queries), payload_skill_id (for skill analytics).
The trace DB is a single SQLite file. Backups use SQLite’s VACUUM INTO (atomic, WAL-safe, hot-snapshot — the source DB does not need to be closed). Restore is a file copy with a schema-version guard. The contract is wired by metis_core.trace.backup and exposed via the metis backup / metis restore subcommands; operator recipe lives in docs/gateway-deployment.md under “Backup & restore”.
Schema versioning. Every trace DB opened by TraceStore is stamped with PRAGMA user_version = TRACE_SCHEMA_VERSION (currently 1). Bump in lockstep with any breaking edit to the events-table schema in §7.1. restore() refuses a backup whose user_version doesn’t match the running code; the diagnostic names both versions and points at the migration path.
Clean-backup invariant. A backup is a single file. VACUUM INTO does not produce -wal / -shm companions. If restore() finds either alongside the source backup, it refuses — the file was not produced by metis backup (or was hand-edited), and proceeding risks losing in-flight writes.
Overwrite protection. Both backup and restore refuse to clobber an existing destination by default. The CLI exposes --force on metis restore for the “replace a corrupt DB” flow; the library’s restore(..., allow_overwrite=True) is the matching opt-in.
Backup metadata. BackupResult (returned by backup() and printed by the CLI) captures: source path, dest path, byte count, schema version, event count, oldest/newest event timestamps. The CLI output is deterministic — no random ids — so operators can checksum it alongside the backup as a paper trail.
The Wave 12 audit log (audit-log.md) is a filtered projection of this trace store. An audit event is a trace event whose type appears in metis_core.events.payloads.AUDIT_EVENT_TYPES — the security/compliance-relevant subset. The set is ten types after the Wave 14a gateway.auth_failed addition: gateway.key_issued, gateway.key_revoked, gateway.key_rotated, gateway.quota_exceeded, gateway.auth_failed, quota.alert, routing.policy_invalid, memory.eviction, pattern.evicted, tool.confirmation_resolved. (Other audit-flagged types — trace.swept, analytics.user_exported, analytics.user_forgotten — are documented in the §6.13 / §6.14 catalog entries and the audit-log spec; the count here is the catalog-domain subset for cross-reference.) See audit-log.md §4 for the per-type rationale.
Three properties bind the audit subset to this spec:
events table as operational events; the (type, timestamp_us) index in §7.1 covers the audit query directly. No parallel write path, no migration.DELETE filters type NOT IN (<AUDIT_EVENT_TYPES>). Audit events outlive every operational sweep.pseudonymous-floor, but that’s an outcome of the v1 subset, not a rule.Audit-export shape (JSONL / CSV) is owned by audit-log.md §7; the CLI is metis audit export (audit-log.md §9). Adding a type to AUDIT_EVENT_TYPES requires a deliberate spec change with a CHANGES.md entry — audit-log.md is the source of truth for the subset; this section is the cross-reference.
User asks “What time is it?” The agent calls a current_time tool and answers.
session.created (parent: none)
↓ time passes ↓
turn.started (parent: none)
↓
route.decided (parent: turn.started)
↓
llm.call_started model=sonnet (parent: turn.started)
↓
llm.call_completed produced_tool_calls=1 (parent: llm.call_started)
↓
tool.called tool_name=current_time (parent: llm.call_completed)
↓
tool.completed success=true (parent: tool.called)
↓
llm.call_started model=sonnet (parent: tool.completed)
↓
llm.call_completed stop_reason=end_turn (parent: llm.call_started)
↓
turn.completed (parent: turn.started)
Walking back from turn.completed via parent_event_id reconstructs the full causal chain.
User asks the agent to read a file that doesn’t exist.
turn.started
↓
route.decided
↓
llm.call_started
↓
llm.call_completed produced_tool_calls=1
↓
tool.called tool_name=read_file, input_hash=...
↓
tool.failed error_class=not_found
↓
llm.call_started (the agent loop tries again with the failure as context)
↓
llm.call_completed stop_reason=end_turn
↓
turn.completed
[planner session sess_42]
turn.started session_id=sess_42
↓
route.decided
↓
llm.call_started model=opus, is_worker=false
↓
llm.call_completed produced_tool_calls=1 (delegate)
↓
tool.called tool_name=delegate
↓
delegate.started worker_session_id=sess_43, tier=fast, resolved_model=haiku
↓
[worker session sess_43 — separate session_id, related via parent_session_id in session record]
session.created session_id=sess_43
↓
turn.started session_id=sess_43
↓
route.decided DELEGATE_REQUEST chose haiku
↓
llm.call_started model=haiku, is_worker=true
↓
llm.call_completed
↓
turn.completed
↓
session.ended session_id=sess_43
[back in planner session]
↓
delegate.completed worker_session_id=sess_43, success=true
↓
tool.completed tool_name=delegate
↓
llm.call_started (planner integrates worker output)
↓
llm.call_completed stop_reason=end_turn
↓
turn.completed session_id=sess_42
The worker session’s events have is_worker: true on llm.call_started. They are queryable independently and roll up into the parent session’s cost via the delegate.completed event.
User has a rule “fast for commits → haiku” but the pattern store suggests sonnet at high confidence. The TUI surfaces the disagreement; the user chooses to override.
turn.started
↓
route.decided (winner: rule chose haiku; pattern deferred sonnet at 0.87
— recorded in chain[].verdict = "deferred" for pattern policy)
↓
[TUI surfaces the disagreement; user runs /route override]
↓
route.overridden (parent: route.decided)
original_chosen_model=haiku
new_chosen_model=sonnet
deferred_policy=pattern
rule_name=fast_for_commits
pattern_confidence=0.87
↓
llm.call_started model=sonnet (note: parent is route.overridden, not route.decided)
↓
...
This shape preserves the routing-engine.md invariant of exactly one route.decided per turn (per routing-engine.md §7.2 and test §10.1.17). The original decision is intact; the override is a distinct event that records what changed.
If the user runs /route ignore instead:
route.decided (rule chose haiku, pattern deferred sonnet at 0.87)
↓
pattern.override_dismissed (purely informational; turn proceeds with haiku)
↓
llm.call_started model=haiku (parent: route.decided)
turn.completed via parent_event_id; verify the chain reaches turn.started with no missing links.bus.gap_detected event is emitted on startup with gap_start_id and gap_end_id corresponding to the missing range.fast_path=true whose handler is annotated @slow (testing helper). The test passes when registration raises FastPathHandlerError; this enforces the convention that slow handlers cannot register on the fast path.sensitivity consistent with the catalog. Specifically: (a) events at their default-fields-only state match the catalog’s declared default sensitivity; (b) opt-in events with their optional fields populated have sensitivity upgraded per §4.4.1.EventBusOverflowError is raised at the emitter and a structured log entry at level ERROR is written. Verify no bus.overflow event is emitted (it is no longer a catalog type).journal_mode=WAL and synchronous=NORMAL, measure single-row insert latency over 1,000 inserts on the test storage; verify p95 < 1ms. (Skipped if running on storage that can’t sustain this; documented in test output.)route.decided exactly-once. Run a turn that triggers a pattern override; verify exactly one route.decided is emitted, followed by exactly one route.overridden. The original decision’s payload is unchanged after the override.route.overridden causality. Verify the route.overridden event has parent_event_id pointing at the original route.decided, and subsequent llm.call_started has parent_event_id pointing at the route.overridden.parent_event_id has a parent that exists in the trace store (eventually — replay-window after a small delay accounts for fast-path race).parent_session_id on the session record but no event-level pointer back to the planner’s delegate.started. Should delegate.started and the worker’s session.created reference each other? V1: only via session metadata. May refine in Phase 4.@fast_handler decorator that asserts wall-time per call) is plausible; deferred.error_message_redacted and user_message_text_redacted rely on heuristic scrubbing. The exact scrub algorithm (regex-based? LLM-based?) is undecided. V1: simple regex for emails, paths, common API key formats. Reviewable and improvable.| Date | Decision | Rationale |
|---|---|---|
| 2026-05-08 | In-process bus; no Kafka, no IPC, no cross-machine | Single-user app; in-process latency is microseconds and sufficient. |
| 2026-05-08 | No in-process redelivery; subscribers dedupe by event id across restarts only | Exactly-once is distributed-systems work, wildly overkill for the use case. |
| 2026-05-08 | Closed type catalog enumerated in this doc | New types are deliberate spec changes; prevents type sprawl. |
| 2026-05-08 | Sensitivity tagging on every event; dynamic on opt-in payloads | Future sync and cross-user features need this from day one; opt-in upgrades the tag honestly. |
| 2026-05-08 | Fast-path vs. batch subscribers | Slow handlers as fast-path stall everything; convention enforces the split. |
| 2026-05-08 | Trace store as a subscriber, not the bus itself | Other subscribers don’t pay disk-write latency; clean separation of concerns. |
| 2026-05-08 | SQLite WAL + synchronous=NORMAL for the trace store |
Required to meet fast-path budget; <1s durability window acceptable for trace data. |
| 2026-05-08 | Memory snapshotter on the batch path | Reading and diffing memory files isn’t <1ms; doesn’t belong on fast path. |
| 2026-05-08 | Causal chains via single parent_event_id pointer, not graphs |
Trees are sufficient; graphs are over-engineered for the question “why did this happen?” |
| 2026-05-08 | Strict vs. lenient validation modes | Strict catches bugs in dev; lenient prevents production crashes from a malformed payload. |
| 2026-05-08 | No FTS5 on payloads in v1 | Query patterns don’t need it; canonical message store handles content search. |
| 2026-05-08 | Pattern override emits route.overridden, not a second route.decided |
Preserves the routing-engine.md invariant of one route.decided per turn; override is its own observable action. |
| 2026-05-08 | Bus diagnostics (overflow, handler errors) go to logs, not events | Avoids chicken-and-egg failures and recursive amplification. |
| 2026-05-08 | routing_policy_version is content hash, not mtime |
mtimes can collide across restore-from-backup; hash is unambiguous. |
| 2026-05-08 | Streaming events explicitly excluded from catalog | A 200-token message produces 200+ rows otherwise; reconstructible from persisted Message. |
| 2026-05-08 | Streaming server has two input channels (bus bridge + direct) | Different lifetimes (persisted vs. live); merged on the wire only. |
| 2026-05-08 | Error class enums in llm.call_failed and tool.failed extended |
Reconciled with provider-adapter (8 values) and tool-dispatcher (8 values). |
| 2026-05-08 | tool.confirmation_* events added to catalog |
Confirmation flow is observable history; needs persistence for analytics. |
| 2026-05-08 | block_dropped is log-only, not catalog event |
Consistent with bus.overflow precedent; not a domain action worth persisting. |
canonical-message-format.md — Message, ToolDefinition, content blocks referenced by tool events; sessions table referenced by §7.1’s foreign key.routing-engine.md — route.decided event consumer details; InsufficientContextRequest schema referenced by delegate.failed.streaming-protocol.md — how WebSocket clients subscribe to a filtered event stream and replay on reconnect.skill-format.md (planned) — skill events emitted on load, create, modify.