Status: Draft v3.2 Last updated: 2026-05-08 Owner: your name
v3.2 changes: Cancellation event sequence cross-reference updated to point at streaming-protocol’s three-case model (§3.4).
v3.1 changes: Auxiliary event
pattern.override_acceptedrenamed toroute.overriddento align with event-bus-and-trace-catalog.md §6.5b (preserves the one-route.decided-per-turn invariant). Delegation phase asymmetry documented (§6 preamble): the chain slot exists from Phase 1 as a stub; thedelegate()tool itself ships in Phase 4.v3 changes: Provider availability state machine made consistent (§4.5). Capability validation extended to tools, system prompt, structured output (§4.4). Predicate snapshot points pinned (§5.3).
skills_loaded_includesrenamed toskills_matching_message_includes(§5.3). Cost-efficiency divide-by-zero defined (§5.5).insufficient_contextschema specified (§6.6). Worker memory/skill/visibility rules added (§6.2.1). Tier upgrade exhaustion behavior stated (§6.9). Workspace tiers require all three slots (§5.7). Mid-turn multiple swaps last-write-wins (§3.3). Various nits.Throughout: paths shown use
~/.yourtool/as a placeholder for the final config directory.
This document specifies the routing engine: the component that decides, for every turn in a session, which model handles it. The engine composes three modes — manual selection by the user, configured rules from a yaml policy, and agent-decided delegation — into a single ordered policy chain. It also defines the contract for delegate(), the tool by which a planner model invokes a sub-agent on a different tier.
Routing is the user-visible feature most likely to feel either magical or untrustworthy. A wrong choice wastes money or produces bad output; a silent override of user intent destroys trust. This spec aims to make every decision explainable, every override visible, and every failure mode predictable.
This spec depends on canonical-message-format.md for Message, ToolDefinition, and AdapterCapabilities.
A turn begins when a USER message is added to the session and ends when the session manager observes an assistant message with stop_reason: end_turn (no further tool calls pending), or the user cancels.
Within a turn there can be many LLM calls and tool invocations:
USER message ─┐
├─ LLM call #1 → ASSISTANT (text + tool_use)
│ └─ tool dispatch → TOOL message
├─ LLM call #2 → ASSISTANT (text + tool_use)
│ └─ tool dispatch → TOOL message
├─ LLM call #3 → ASSISTANT (text, end_turn) ◄── turn ends
│
(next USER message starts the next turn)
All LLM calls within one turn are part of the same turn_id.
The model chosen at turn start owns the entire turn. All LLM calls within the turn use that model, including tool-loop continuations. Re-routing happens only at turn boundaries.
Rationale:
estimated_input_tokens grew with tool results.The single exception is delegation: when a model calls delegate(), the worker runs in a separate session with its own routing decision (per §6). The parent turn’s lock is unaffected; it resumes on the parent model when delegate() returns.
/model swapsIf the user runs /model <id> while a turn is in flight, the swap is queued and applies to the next turn. The TUI surfaces this:
Model swap pending: anthropic:claude-opus-4-7. Applies to next turn.
This is a deliberate UX choice — the alternative (cancel the in-flight turn and restart on the new model) is more disruptive than waiting one turn boundary.
If the user runs multiple /model commands during a single turn, last-write-wins: only the most recent pending swap takes effect at the next turn boundary. Earlier pending swaps are silently superseded; the TUI banner updates to reflect the latest target.
User-cancellation (Ctrl-C) ends the turn early with status: cancelled. The next USER message starts a new turn and goes through fresh routing. Any queued model swap takes effect at that boundary.
The exact event sequence emitted on cancellation depends on where in the turn lifecycle the cancel arrives — see streaming-protocol.md §6.2 for the three cases (cancel during LLM streaming, cancel during tool dispatch, cancel at the seam). The routing engine itself does not emit cancellation events; it simply stops dispatching new LLM calls or tool calls for the cancelled turn and lets the session manager and adapters emit the canonical sequence.
For each turn, at turn start, the engine runs policies in fixed order. The first policy returning a non-None, validated RoutingDecision wins:
1. PER_MESSAGE_OVERRIDE — explicit @model syntax in the user's message
2. MANUAL_STICKY — session.active_model set explicitly via /model
3. CONFIGURED_RULES — first-match-wins over the rules list
4. PATTERN_RECOMMENDATION — pattern store result with confidence ≥ threshold
5. DELEGATE_REQUEST — only present in the chain during a delegation re-entry; see §6
6. WORKSPACE_DEFAULT — workspace-scoped default
7. GLOBAL_DEFAULT — hardcoded fallback
(5) is conditional — it’s in the chain only during a delegation. During normal turns it’s skipped.
User intent dominates. (1) is the most local user signal — “just this message.” (2) is the session-level user signal. (3) is pre-declared user policy. (4) is system inference, ranked below user-set things by design. (5) handles delegation. (6) and (7) are floors.
This order is deliberate and stable. Reordering it (e.g., putting pattern recommendations above rules) would let learned behavior silently override user choices — the failure mode that destroys trust.
MANUAL_STICKY is opt-inMANUAL_STICKY returns None unless the user has explicitly set a sticky model via /model <id> in the current session. A fresh session has no sticky and falls through to rules.
Setting a sticky is opt-out from rule-based routing for that session. Rationale: a user typing /model haiku is signaling “I want Haiku, ignore my rules.” That signal should be honored.
To return to rule-based routing within a session, the user runs /model - (clears sticky).
Every policy that returns a candidate model passes through validation before becoming the winner:
def validate(model: str, context: TurnContext) -> ValidationResult:
caps = adapter_registry.capabilities(model)
# Availability
if not adapter_registry.is_configured(model):
REJECT(reason="not_configured") # missing API key, etc.
if not adapter_registry.is_available(model):
REJECT(reason="provider_unavailable") # see §4.5
# Capability — gated on whether the turn actually needs the capability
if context.has_images and not caps.supports_images:
REJECT(reason="no_vision_support")
if context.estimated_input_tokens > caps.max_context_tokens:
REJECT(reason="exceeds_context_window")
if context.has_tool_definitions and not caps.supports_tools:
REJECT(reason="no_tool_support")
if context.has_system_prompt and not caps.supports_system_prompt:
REJECT(reason="no_system_prompt_support")
if context.requires_structured_output and not caps.supports_structured_output:
REJECT(reason="no_structured_output_support")
return OK
A rejected candidate causes the policy to be treated as if it returned None; the chain continues. Each rejection is recorded in the canonical route.decided event (§7) as part of the policy’s evaluation.
The capability gates follow the “only require what we’ll use” principle: a turn with no images doesn’t need supports_images; a worker with no output_schema doesn’t need supports_structured_output. This avoids spurious rejections that would force fallthrough when the model would actually have worked.
supports_thinking is not validated. Models that don’t support thinking simply have thinking blocks dropped at adapter serialization time (per §7.3 of the canonical format spec).
This requires
AdapterCapabilities(defined incanonical-message-format.md§7.2) to declaresupports_tools,supports_system_prompt, andsupports_structured_output. The canonical format spec must be updated to add these fields. Existing entries that don’t yet declare them default totrueforsupports_toolsandsupports_system_prompt(the common case for both Anthropic and OpenAI) andfalseforsupports_structured_output.
Availability is tracked at two granularities, both maintained by the adapter registry:
Each scope has the same three states:
provider_unavailable. Auto-clears as defined below.| Failure pattern | Scope marked Unavailable | Rationale |
|---|---|---|
≥5 consecutive failures on one (provider, model) within 2 minutes |
That (provider, model) |
Single-model issue (rate limit, deprecation). |
| ≥3 distinct models from one provider hit Unavailable within 2 minutes | The whole provider | Pattern points to a provider-wide issue. |
| Any auth error (401, 403) on any model from a provider | The whole provider | Misconfigured key affects everything. |
| ≥2 DNS / network errors reaching a provider’s host within 30 seconds | The whole provider | Sustained connectivity loss, not model-specific. |
| A single transient DNS / network error reaching a provider’s host | None (counts toward the per-(provider, model) 5-strike threshold below) |
One-off SSL renegotiation or TCP RST is not an outage. |
| Bounded exponential backoff exhausted inside a single adapter call | No state change | Per-call transient handling; not a signal of sustained outage. |
A successful call against a (provider, model) clears that scope’s Unavailable state immediately. A successful call against any model from a provider clears the provider-wide Unavailable state and the sliding NETWORK-failure window.
Why NETWORK is not immediate (refined 2026-05-16): an earlier revision blacked the whole provider out on a single NETWORK error, on the theory that DNS / connectivity issues affect every model identically. In practice transient SSL handshake errors (ssl.SSLError: SSLV3_ALERT_BAD_RECORD_MAC, httpx.ConnectError mid-TLS-renegotiation, one-off TCP RST) reach the adapter as ErrorClass.NETWORK but represent a single failed connection rather than a sustained provider-side outage. The 5-minute auto-clear made one transient hiccup look like a 5-minute provider blackout. The 2-within-30-seconds threshold filters the one-off from the real outage: a genuine DNS / regional-network failure will produce a second NETWORK error well inside 30 seconds, while a one-off TLS glitch resolves on the next call.
The thresholds (_NETWORK_PROVIDER_ESCALATION_THRESHOLD, _NETWORK_PROVIDER_ESCALATION_WINDOW_SECONDS) live in availability.py as module constants; AUTH still escalates immediately because a misconfigured key cannot be a one-off.
Without successful calls, Unavailable states auto-clear after 5 minutes of no attempts. This prevents the system from being permanently locked out of a provider that recovered while idle.
When a policy’s chosen model is m from provider p:
p is provider-wide Unavailable → reject with provider_unavailable (provider-wide).(p, m) is Unavailable → reject with provider_unavailable (model-specific).Both rejection cases use the same validation_failure value (provider_unavailable); the route.decided event’s reason field disambiguates (“anthropic:claude-opus-4-7 model-specific outage” vs. “all anthropic models temporarily unavailable”).
When a model-specific Unavailable causes fallthrough:
anthropic:claude-opus-4-7 currently unavailable. Routing fell through to anthropic:claude-sonnet-4-6.
When a provider-wide Unavailable causes fallthrough:
anthropic provider currently unavailable. Routing fell through to openai:gpt-5 (workspace default).
Banners clear when the corresponding state returns to Healthy.
Rules do not carry fallback lists (fallback: [model_a, model_b]). All fallback is handled by chain fallthrough.
Rationale: per-rule fallback breaks the predictability invariant. If rule_A has fallbacks [opus, sonnet, haiku] and rule_B has different fallbacks, debugging “why this model?” becomes a search through multiple lists. Chain fallthrough keeps the explanation linear: each policy gets one shot, and the chain is short.
If a user wants a specific fallback order for a workspace, they encode it as additional rules:
rules:
- name: "deep for architecture"
when: {message_matches: "architecture"}
use: anthropic:claude-opus-4-7
- name: "deep for architecture (sonnet fallback)"
when: {message_matches: "architecture"}
use: anthropic:claude-sonnet-4-6
The second rule fires only if the first’s model is unavailable (validation rejects it, chain continues, second rule’s predicate matches).
If every policy in the chain returns None or fails validation, the engine raises a hard error to the session manager. The TUI surfaces:
No model available for this turn.
Tried: anthropic:claude-opus-4-7 (unavailable), anthropic:claude-sonnet-4-6 (unavailable)
Run /model <id> to choose explicitly, or /rules check.
The turn is not started. The user must intervene (set a sticky, fix config, wait for provider recovery). Silently using a model the user didn’t authorize is never acceptable.
The configured policy file is read fresh at the start of every turn. Cost: a yaml parse and validation pass, ~1ms for a typical file. The router caches the parsed structure keyed by file mtime to avoid re-parsing when nothing changed.
If the file is invalid (yaml syntax error, unknown predicate, unknown model), the router uses the last-known-good version and surfaces this in the TUI. Users diagnose with /rules check.
# ~/.yourtool/routing.yaml
schema_version: 1
global_default: anthropic:claude-sonnet-4-6
# Tier mapping for delegation (§6.10)
tiers:
fast: anthropic:claude-haiku-4-5
balanced: anthropic:claude-sonnet-4-6
deep: anthropic:claude-opus-4-7
# Pattern store weighting (§5.5)
pattern:
cost_weight: 0.05 # 0.0 = pure quality, 1.0 = pure cost (default 0.05)
min_confidence: 0.05 # default 0.05 — scaled to match cost_weight=0.05 (see §5.5)
min_sample_size: 5
rules:
- name: "fast for commits"
when:
message_matches: "^/commit|write.*commit message"
use: anthropic:claude-haiku-4-5
- name: "deep for architecture"
when:
any_of:
- message_matches: "(architecture|design review|security review)"
- skills_matching_message_includes: "system_design"
use: anthropic:claude-opus-4-7
- name: "long context"
when:
estimated_input_tokens_gt: 80000
use: anthropic:claude-opus-4-7
- name: "budget circuit breaker"
when:
cost_today_exceeds_usd: 5.00
use: anthropic:claude-haiku-4-5
workspaces:
~/code/myproject:
default: openai:gpt-5
pattern:
cost_weight: 0.7 # this is the "ship reliable" workspace
tiers:
fast: openai:gpt-5-mini
balanced: openai:gpt-5
deep: openai:gpt-5 # no deeper option configured; deep == balanced here
rules:
- name: "this project uses gpt for SQL"
when:
file_extensions_in_context: [".sql"]
use: openai:gpt-5
Within rules, top-to-bottom, first match wins. Each rule has a unique name (used for tracing and the /rules UI). Rules without name get a synthetic name rule_<index>.
Within workspaces.{path}.rules, same semantics, but workspace rules run before global rules. Workspace default and pattern config replace the corresponding global section for that workspace (full replacement, not merge — v1 simplification).
Workspace tiers must define all three slots (fast, balanced, deep) or be omitted entirely. A partial workspace tier map is rejected at validation time. Rationale: silent gaps in the tier map cause delegate(tier="balanced") calls inside the workspace to fail with no_model_available_for_tier for non-obvious reasons. Forcing all-or-nothing makes the failure mode at config time, not at runtime. If a workspace truly only wants to override one tier, it must restate the others (typically by copying the global mapping).
Closed set. Adding a predicate is a deliberate spec change.
Predicates evaluate against state captured at routing time, which is the start of a turn. Each predicate has a defined input source:
| Predicate | Reads from |
|---|---|
message_matches, message_contains_any |
The new USER message of the current turn. |
estimated_input_tokens_* |
The active adapter’s estimate_input_tokens() against the messages that would be sent if the candidate model were chosen, including the assembled system prompt and tool definitions. |
has_images |
The new USER message of the current turn (image blocks). |
has_tool_calls_in_history |
The session’s canonical message store (any prior ASSISTANT message with tool_use blocks). |
skills_matching_message_includes |
The skill-description index, matched against the new USER message. The index is built at session start from frontmatter descriptions only — no skill bodies are loaded for this. |
file_extensions_in_context |
File extensions appearing in any tool input or tool result of the current session (the file paths the agent has actually touched). User-message text is not scanned — too noisy. At the first turn this is always empty. |
workspace_path_matches |
The session’s workspace path, set at session creation. |
time_of_day_between |
Wall clock at routing time, in the user’s local timezone. |
cost_today_exceeds_usd |
The accumulated cost across the user’s sessions since UTC midnight. |
| Predicate | Type | Description |
|---|---|---|
message_matches |
regex | Matches the user’s turn message (the new USER message). |
message_contains_any |
[string] | Any substring (case-insensitive) appears in the message. |
estimated_input_tokens_gt |
int | estimate_input_tokens() exceeds threshold. |
estimated_input_tokens_lt |
int | estimate_input_tokens() is under threshold. |
has_images |
bool | Any image block in this turn’s user message. |
has_tool_calls_in_history |
bool | Any prior assistant message has tool_use blocks. |
skills_matching_message_includes |
[string] | Skill name(s) whose description match the user message above threshold. |
file_extensions_in_context |
[string] | File types touched by tools in this session (case-insensitive). |
workspace_path_matches |
regex | Workspace’s absolute path matches. |
time_of_day_between |
[HH:MM,HH:MM] | Local time falls in window. Wraps midnight: [22:00, 06:00]. |
cost_today_exceeds_usd |
float | Sum of today’s session costs (UTC midnight). |
any_of |
[predicate] | Logical OR. |
all_of |
[predicate] | Logical AND. |
not |
predicate | Logical NOT. |
A when block with multiple top-level keys is implicitly all_of.
Note on
skills_matching_message_includes: prior drafts of this spec called this predicateskills_loaded_includes. That name was misleading because skills are loaded by the context assembler, which runs after routing — at routing time no skills are “loaded” yet. The current name reflects what’s actually checked: a fast match against the skill description index.
cost_today_exceeds_usd is a first-class predicate. When it fires, the TUI surfaces:
Daily budget $5.00 exceeded ($5.42 today). Routing per "budget circuit breaker" rule.
Banner clears at UTC midnight reset.
The pattern policy queries the pattern store for the K nearest fingerprints (default K=10) to the current turn’s fingerprint. Among the K neighbors, it groups by outcome.primary_model. For each model M in the cluster, it computes:
normalized_success_M = sample-size-weighted mean(success_score) for neighbors
with primary_model = M, computed as
Σ(success_score_i × sample_size_i) / Σ(sample_size_i)
(already in 0..1; a neighbor row with 50 contributing
sessions weights 50× a single-shot row, so well-evidenced
outcomes dominate noisy one-offs)
if max_avg_cost_in_cluster == min_avg_cost_in_cluster:
normalized_cost_efficiency_M = 0 for all models in the cluster
else:
normalized_cost_efficiency_M = (max_avg_cost_in_cluster - avg_cost_M)
/ (max_avg_cost_in_cluster - min_avg_cost_in_cluster)
(0..1; cheapest gets 1.0, most expensive gets 0.0)
score_M = (1 - cost_weight) × normalized_success_M
+ cost_weight × normalized_cost_efficiency_M
The degenerate case (all candidate models in the cluster have identical average cost) zeroes out the cost-efficiency term entirely, making the score reduce to (1 - cost_weight) × normalized_success_M — i.e., the decision falls to pure quality. This is the right behavior: there is no cost differentiation to weight.
cost_weight is configurable per workspace (default 0.05, lowered from 0.1 on 2026-05-15 per the §A3-rev5 benchmark finding, which itself succeeded the 0.3 → 0.1 migration on 2026-05-14 — see “Default rationale” below). cost_weight = 0 means “pure quality, ignore cost”; cost_weight = 1 means “pure cost, ignore quality”; values in between blend.
The model with the highest aggregate score is the recommendation. The runner-up appears in alternatives.
confidence = (top_score - runner_up_score) / top_score
If top_score == 0, confidence is 0. The pattern policy returns None if confidence < pattern.min_confidence or sample_size < pattern.min_sample_size. Both are configurable.
Default rationale. The cost_weight default of 0.05 (was 0.1 from 2026-05-14, was 0.3 prior to 2026-05-14) is itself a value judgment — but a documented one. A user prototyping wants higher; a user shipping production code may want lower (closer to pure quality). The point is that the tradeoff is visible, not hidden, and the user can see and override it.
The default was lowered from 0.3 → 0.1 after the §A3-rev benchmark run (see benchmarks/RESULTS.md). At 0.3 the cost-efficiency term required a success delta of ~0.43 to flip the chooser when the cheapest model also scored 1.0 on cost_efficiency — larger than the 0.15–0.30 cluster-level quality deltas the LLM judge produced in real data. The result was slot 4 picking the cheaper model on every routed turn regardless of evidence. At 0.1 a quality delta of ~0.143 is enough to invert the ranking, which the observed deltas do clear. The scoring formula is unchanged — only the default of the blend constant moved. Workspaces that depended on the prior cost-bias must restate cost_weight: 0.3 in their routing.yaml.
The default was lowered again from 0.1 → 0.05 on 2026-05-15 after the §A3-rev5 benchmark run. §A3-rev5 reproduced a separate failure mode: with the v2 HYBRID fingerprint wiring landed (Wave 11) and the §A3-rev3 defaults in place, slot 4 still picked haiku on all 17 routed Pass C turns — including regex-with-edge-cases where haiku rubric-fails on the hard “16 edge case tests” turn (q=0.19) and sonnet rubric-passes (q=1.00). Diagnosis: cost_efficiency normalizes per cluster to [0.0, 1.0], so at cost_weight=0.1 whichever model is cheapest gets a flat +0.10 floor on its score regardless of cluster geometry. On the §A3-rev5 regex cluster (haiku q=0.91 vs sonnet q=1.00, cost_haiku ≪ cost_sonnet) this floor swamped the 0.09 quality delta and slot 4 picked haiku at conf=0.011 — gated off, falling to slot 7. Direct simulation against the a3rev5-patterns.db snapshot under cw=0.05 showed 6 sonnet picks pass the min_confidence=0.05 gate where cw=0.10 produced 0; haiku-correct decisions on workloads with genuine quality dominance (multi-file-refactor q=0.79 vs 0.67; multi-turn-refactor q=1.00 vs 0.95) still pick haiku at high confidence. The scoring formula is unchanged — only the default of the blend constant moved. Workspaces that depended on the 0.1 cost bias restate cost_weight: 0.1 in their routing.yaml. (Per-prompt sub-cluster partitioning was considered as an alternative wedge but found unnecessary: the K-NN already pulls 9 of 10 same-workload neighbors per cluster on §A3-rev5 data.)
The min_confidence default was lowered from 0.3 → 0.05 in the 2026-05-14 wave after the §A3-rev2 benchmark run. The two knobs are coupled: confidence is (top_score - runner_up_score) / top_score, and score itself is (1 - cost_weight) * success + cost_weight * cost_efficiency. Under the legacy cost_weight=0.3, the cost-efficiency term alone — independent of any quality delta — produced ~0.35 confidence on tied-quality clusters where the two models had different costs, so min_confidence=0.3 acted as a noise gate without suppressing genuine signal. Under cost_weight=0.1 the same tied-quality clusters produce ~0.10 confidence, and the legacy 0.3 gate suppresses real cluster inversions: §A3-rev2 Pass C turn 2 on write-a-doc-from-notes aggregated sonnet=0.900 ahead of haiku=0.842 (the first cluster-level inversion in any A3 series) with confidence 0.064, and slot 4 emitted not_applicable. At 0.05 the gate scales down with the cost-weight reduction so real inversions can fire; cluster-empty / zero-score / fewer-than-K-cluster cases still gate off in aggregation.py. Workspaces that depended on the prior tighter gate restate min_confidence: 0.3 in their routing.yaml. The 2026-05-15 cost_weight 0.1 → 0.05 migration leaves min_confidence=0.05 unchanged: under the new cost_weight=0.05 the cost-floor effect on confidence drops further (~0.05 max contribution from cost_efficiency saturation alone), and the §A3-rev2 inversion-friendly ratio still clears the 0.05 gate.
When both a rule and a pattern recommendation are available, the rule wins (per §4.1). The pattern recommendation is not discarded — it’s recorded in the route.decided event as a deferred policy with its own evaluation.
In Phase 3, an opt-in feature surfaces high-confidence pattern disagreement to the user:
pattern_disagreement:
surface: true # default false
min_confidence: 0.85
min_sample_size: 20
When enabled and a pattern recommendation disagrees with the chosen rule above the thresholds, the TUI shows:
→ Routing to anthropic:claude-haiku-4-5 per rule "fast for commits"
Pattern store suggests: anthropic:claude-sonnet-4-6 (confidence 0.87, 23 similar tasks)
/route override to use Sonnet for this turn
/route ignore to dismiss
The user’s choice is itself an event (route.overridden for accept, pattern.override_dismissed for ignore), feeding back into pattern learning. See event-bus-and-trace-catalog.md §6.5b for payloads.
At load time, the router validates:
schema_version matches a supported version.use and default references a model in the adapter registry.tiers blocks define all three slots (fast, balanced, deep) or are absent. Partial maps are rejected.name is duplicated (synthetic names excepted).pattern.cost_weight is in [0.0, 1.0]; min_confidence in [0.0, 1.0]; min_sample_size ≥ 1.A failure in any of these causes the file to be rejected as a whole — last-known-good is used. /rules check prints validation errors.
By design, rules cannot:
These constraints are how the engine stays predictable. Users wanting richer logic write a skill, not a rule.
delegate() contractPhase note. The
delegate()tool ships in Phase 4. The routing chain’sDELEGATE_REQUESTpolicy slot (§4.1, position 5), however, exists from Phase 1 as a stub that always returnsnot_applicable. This asymmetry is deliberate: the routing pipeline’s shape is fixed across phases, and adding the slot’s behavior later is filling in a stub rather than refactoring the chain. The catalog’sdelegate.*events likewise ship in Phase 4 (perevent-bus-and-trace-catalog.md§6.8). Phase 1 implementations should:
- Include
DELEGATE_REQUESTin the policy chain enumeration so trace events have a consistent shape.- Not register the
delegatetool with any model, regardless ofcan_delegateconfiguration.- Skip §6.1 through §6.10 of this document for implementation purposes; they describe the Phase 4 contract.
The delegate tool is registered automatically when a session’s active model is on a tier marked as can_delegate: true (typically only balanced and deep). Its schema:
delegate(
tier: Literal["fast", "balanced", "deep"],
task: str, # focused instruction for the worker
context: ContextSpec, # see §6.3
output_schema: dict | None = None, # optional JSON schema for return
allowed_tools: list[str] | None = None, # default: same as planner's
max_tokens: int | None = None, # cap on worker output
) -> DelegateResult
Workers cannot delegate. The tool is not registered for sessions invoked as workers. This prevents fan-out and recursion in v1.
A worker is a fresh session — same architecture, same context assembler, same tool dispatcher — instantiated with curated context and run to completion. It has:
parent_session_id and parent_tool_use_id).allowed_tools (default: same set as the planner).To prevent workers from mutating state the planner is reasoning about, workers operate read-only against the following:
memory_add, memory_replace, memory_consolidate) are not registered for worker sessions, regardless of allowed_tools. Rationale: the planner has the broader context to judge what’s worth remembering. A worker shouldn’t change the planner’s durable view of the world from inside a sub-task.load_skill is available). Workers cannot create, modify, or delete skill files. Skill auto-generation (Phase 2.5) does not run from worker sessions.routing.yaml (no rule-management tool exists in v1, but if such a tool exists in a future phase, it is not registered for workers).What workers can do (via their normal tool access): read files, write files, run shell commands, call any other tool the planner had — these are task-shaped operations, not durable system state.
By default, worker sessions are not listed in /history or the dashboard’s session list. They are visible:
delegate.completed event, which carries worker_session_id and links to a full worker session view./history --include-workers, which surfaces them as nested entries under their parents.Rationale: at scale a single planner session can spawn many workers; flattening them into history clutters the user’s view of their own work. They remain reachable for analytics and debugging.
Worker sessions are persisted with the same retention as parent sessions and follow the same trace-event rules (with is_worker: true flagged on llm.call_started events per the event catalog).
ContextSpec is one of:
# Mode 1: minimal — task brief only.
{"mode": "minimal"}
# Mode 2: explicit — references the planner specifies.
{"mode": "explicit", "include": [...]}
There is no “auto” mode in v1. The planner must decide what context the worker needs. This forces the planner to think about handoff, which produces more reliable results than letting the system guess.
include items in mode 2 can be:
{"type": "file", "path": "src/auth.ts"} # worker re-reads via file tool
{"type": "file_range", "path": "src/auth.ts", "lines": [40,80]} # only those lines
{"type": "tool_result", "tool_use_id": "tu_01HZ100"} # past tool call result
{"type": "message", "message_id": "01HZ_..."} # specific past turn
{"type": "inline", "label": "decision", "text": "..."} # planner-authored note
Files referenced by path are not inlined at delegation time — they appear as references the worker re-reads through the file tool. This keeps planner→worker handoff cheap and ensures the worker reads current file content.
Tool results, message content, and inline notes are inlined (the planner has already curated them).
The planner’s system prompt teaches:
minimal — for self-contained mechanical tasks. “Format this JSON.” “Generate a UUID.” “Convert these dates from ISO to RFC 2822.” The worker doesn’t need session context.explicit — for tasks needing prior context. “Refactor the function defined in src/auth.ts:authenticate. Use the JWT approach we decided on (see message 01HZ_...).”If the planner is unsure which to use, default to explicit with the relevant references. A worker can fail and report insufficient_context (§6.6) — the planner can then retry with more references or take over.
You are a sub-agent invoked by another agent (the "planner") to handle
a specific sub-task. You are running on model {worker_model}.
Your task:
{task}
{context_summary}
Guidelines:
- Focus only on this task. Do not pursue tangents.
- If you need information not provided, use the available tools to fetch it.
- Do not ask the user for clarification. If clarification is needed, return
a structured response indicating what's missing.
- Be terse. The planner will see your final output and integrate it.
- {if output_schema:} Your final response must conform to this schema:
{output_schema}
- Return when complete. Do not produce a summary unless asked.
Available tools: {allowed_tools}
{context_summary} lists included files and references with brief descriptions.
class DelegateResult:
success: bool
output: str | dict # text by default; dict if output_schema specified
error: str | None
usage_summary: dict # tokens, cost, turn count, tool calls made
worker_session_id: str # for trace lookup
Failure modes:
| Failure | success | error | output |
|---|---|---|---|
| Worker raised an unhandled error | false | "worker_error: {message}" |
partial output if any |
Worker hit max_tokens |
false | "max_tokens_exceeded" |
truncated output |
| Worker requested missing context | false | "insufficient_context" |
InsufficientContextRequest (see below) |
Worker output didn’t match output_schema |
false | "output_schema_validation_failed" |
raw output |
No model available for tier |
false | "no_model_available_for_tier" |
empty |
| User cancelled the planner mid-delegation | false | "cancelled_by_user" |
partial if any |
The planner decides what to do: retry with tier: "deep", take over directly, or surface to the user.
insufficient_context schemaWhen a worker determines it cannot complete the task without more information, it returns a structured request rather than free text. This lets the planner programmatically retry with additional context rather than re-prompting itself.
class InsufficientContextRequest:
missing: list[MissingItem]
summary: str # one-sentence human-readable description
class MissingItem:
type: Literal["file", "file_range", "message", "tool_result", "decision", "other"]
ref: str # path, message_id, tool_use_id, or human-readable label
hint: str # what the worker would do with this; helps planner judge
# Example:
{
"missing": [
{"type": "file", "ref": "src/auth/jwt.ts",
"hint": "need to see the current JWT signing logic to refactor it"},
{"type": "decision", "ref": "token expiration policy",
"hint": "need to know the agreed-upon expiration window"}
],
"summary": "Need the current JWT implementation and the team's expiration policy."
}
The worker’s system prompt teaches it to return this shape via a special _request_context tool that it calls to terminate the worker session with this payload. The planner sees the structured request as the output field of the DelegateResult when error == "insufficient_context".
Planner behavior on insufficient_context:
missing. If references can be resolved (files exist, decisions are recorded in MEMORY.md, etc.), retry delegate() with those references added to context.include.Worker token usage is attributed to the worker session, but rolled up into the parent session’s totals for user-facing displays. Trace events include both session_id (the worker’s own) and parent_session_id. The dashboard shows planner cost vs. worker cost broken out:
Session sess_42 — total $0.34
├─ planner (anthropic:claude-opus-4-7): $0.21, 12 turns
└─ workers: $0.13, 4 delegations
├─ tu_01HZ_a → anthropic:claude-haiku-4-5: $0.02, 1 turn
├─ tu_01HZ_b → anthropic:claude-haiku-4-5: $0.04, 2 turns
├─ tu_01HZ_c → openai:gpt-5-mini: $0.03, 1 turn
└─ tu_01HZ_d → anthropic:claude-haiku-4-5: $0.04, 2 turns
can_delegate flag# part of the model registry config
models:
anthropic:claude-opus-4-7:
tier: deep
can_delegate: true
anthropic:claude-sonnet-4-6:
tier: balanced
can_delegate: true
anthropic:claude-haiku-4-5:
tier: fast
can_delegate: false
can_delegate: false means the delegate tool is not registered when this model is the active session model. Enforced at session start; updated when the active model changes.
When delegate(tier=...) is called, the routing engine resolves tier to a concrete model by consulting the tiers config. That resolved model is the worker’s candidate. It enters the policy chain at the DELEGATE_REQUEST slot.
Capability validation still applies. If the resolved tier model fails validation (e.g., the task involves images and the fast model is text-only), the engine upgrades along fast → balanced → deep and re-validates at each step. If the requested tier was already deep and deep fails, or if deep itself fails after upgrades, delegate returns no_model_available_for_tier — there is no tier above deep to escalate to.
The worker’s turn itself is still turn-locked (§3.2). The worker session can have multiple internal LLM calls and tool cycles, all on the same model.
The policy chain runs end-to-end for worker sessions, including policies that cannot apply to workers by construction (PER_MESSAGE_OVERRIDE — workers don’t have user messages with @ prefixes; MANUAL_STICKY — workers have no user /model command; CONFIGURED_RULES — can match against the worker’s task brief and would fire if the user wrote rules targeting it, though this is rare).
The alternative — skipping inapplicable policies in worker mode — produces a cleaner trace but adds a special case to the routing pipeline. A reviewer might prefer either; this spec keeps the chain uniform for predictability. The route.decided event traces for workers will show not_applicable verdicts on the user-facing policies, which is the trade-off.
tier is an abstraction over concrete models, mapped per-workspace via tiers config (§5.1).
| Tier | Intent |
|---|---|
fast |
Cheap, low-latency. Used for mechanical sub-tasks. |
balanced |
Capable enough for most coding work. Default for medium tasks. |
deep |
The most capable available. Used for planning and reasoning. |
Tier resolution falls back through workspace → global → registry default. A misconfigured tier (no model assigned) causes delegate(tier=X) to return no_model_available_for_tier.
route.decided eventEvery turn produces exactly one route.decided event, emitted at turn start after the chain runs. This is the source of truth for “why this model?”
class RouteDecidedEvent:
# Standard event envelope (see event-bus spec)
type: Literal["route.decided"]
timestamp: datetime
session_id: str
turn_id: str
# Routing-specific payload
chain: list[PolicyEvaluation] # one entry per policy in chain order
winner_index: int # index into chain
chosen_model: str
elapsed_ms: float
class PolicyEvaluation:
policy: str # PER_MESSAGE_OVERRIDE | MANUAL_STICKY | RULE
# | PATTERN | DELEGATE_REQUEST
# | WORKSPACE_DEFAULT | GLOBAL_DEFAULT
verdict: Verdict # NOT_APPLICABLE | DEFERRED | REJECTED | CHOSE
candidate_model: str | None # what this policy proposed (None if NOT_APPLICABLE)
reason: str # human-readable
# Mode-specific extras
rule_name: str | None # for RULE
confidence: float | None # for PATTERN
pattern_alternatives: list[ModelOption] | None # for PATTERN
# Validation outcome (if candidate_model was set)
validation_failure: str | None # "no_vision_support" | "exceeds_context_window"
# | "no_tool_support" | "no_system_prompt_support"
# | "no_structured_output_support"
# | "provider_unavailable" | "not_configured"
# | None if validation passed
class ModelOption:
model: str
score: float # the aggregate score from §5.5
sample_size: int # how many neighbors used this model
class Verdict(StrEnum):
NOT_APPLICABLE = "not_applicable" # policy didn't have anything to say (e.g. no override token)
DEFERRED = "deferred" # policy had a candidate but a higher-priority policy won
REJECTED = "rejected" # candidate failed validation, chain continued
CHOSE = "chose" # this policy won
Earlier drafts had separate events for routing.constraint_failure, routing.rule_skipped, routing.policy_invalid. Folding them into a single route.decided event means:
routing.policy_invalid (the rule file failed to load) remains a separate session-level event since it’s not turn-scoped.
These remain separate events because they describe distinct user actions or worker lifecycle:
| Event type | When |
|---|---|
route.overridden |
User chose /route override (turn re-dispatched on pattern’s choice). |
pattern.override_dismissed |
User chose /route ignore (turn proceeds with original choice). |
delegate.started |
A delegate tool call began. Includes worker_session_id. |
delegate.completed |
The worker session ended. |
delegate.failed |
The worker session failed; includes failure mode (§6.6). |
routing.policy_invalid |
The rule file failed to load; last-known-good in use. |
routing.provider_unavailable |
Provider state transitioned to Unavailable. |
routing.provider_recovered |
Provider state transitioned back to Healthy. |
All events conform to the schema defined in event-bus-and-trace-catalog.md.
The “why this model?” view is a single record render:
Turn 01HZ_xyz · session sess_42 · 2026-05-08T14:23:11Z
Chose: anthropic:claude-sonnet-4-6 (workspace default)
Chain:
[1] PER_MESSAGE_OVERRIDE not_applicable no @model token in message
[2] MANUAL_STICKY not_applicable no sticky model set
[3] CONFIGURED_RULES rejected rule "deep for architecture" matched →
anthropic:claude-opus-4-7 (provider_unavailable)
[4] PATTERN_RECOMMENDATION deferred suggested anthropic:claude-sonnet-4-6
(confidence 0.71, 14 samples) — outranked by rule
[5] WORKSPACE_DEFAULT chose anthropic:claude-sonnet-4-6
The TUI’s /model show command prints the same trace inline.
session.active_model = "anthropic:claude-sonnet-4-6" (user ran /model sonnet)
user: "Refactor this function."
Chain:
PER_MESSAGE_OVERRIDE not_applicable
MANUAL_STICKY chose → anthropic:claude-sonnet-4-6 (validates)
session.active_model = None (no /model run)
rules: [{name: "fast for commits", when: {message_matches: "^/commit"}, use: haiku}]
user: "/commit fix the auth bug"
Chain:
PER_MESSAGE_OVERRIDE not_applicable
MANUAL_STICKY not_applicable
CONFIGURED_RULES chose → anthropic:claude-haiku-4-5 (rule "fast for commits")
session.active_model = "anthropic:claude-sonnet-4-6"
user: "@haiku what's a quick name for this variable?"
Chain:
PER_MESSAGE_OVERRIDE chose → anthropic:claude-haiku-4-5 (override "@haiku")
session.active_model is unchanged after this turn.
session.active_model = None
rules: []
pattern store: returns sonnet, confidence 0.78, sample 12
Chain:
PER_MESSAGE_OVERRIDE not_applicable
MANUAL_STICKY not_applicable
CONFIGURED_RULES not_applicable (no rules match)
PATTERN_RECOMMENDATION chose → anthropic:claude-sonnet-4-6 (confidence 0.78, 12 samples)
rules: [{name: "deep for architecture", when: {message_matches: "architecture"}, use: opus}]
workspace_default: anthropic:claude-sonnet-4-6
provider state: (anthropic, claude-opus-4-7) — Unavailable
anthropic provider-wide — Healthy
(anthropic, claude-sonnet-4-6) — Healthy
user: "Walk me through the architecture of this codebase"
Chain:
PER_MESSAGE_OVERRIDE not_applicable
MANUAL_STICKY not_applicable
CONFIGURED_RULES rejected → opus (provider_unavailable, model-specific)
PATTERN_RECOMMENDATION not_applicable (insufficient samples)
WORKSPACE_DEFAULT chose → anthropic:claude-sonnet-4-6 (validates)
TUI banner:
anthropic:claude-opus-4-7 currently unavailable. Routing fell through to anthropic:claude-sonnet-4-6.
rules: [{name: "default override", when: {}, use: opus}]
workspace_default: anthropic:claude-sonnet-4-6
global_default: anthropic:claude-haiku-4-5
provider state: anthropic provider-wide — Unavailable (auth error triggered escalation)
no other providers configured
Chain:
PER_MESSAGE_OVERRIDE not_applicable
MANUAL_STICKY not_applicable
CONFIGURED_RULES rejected → opus (provider_unavailable, provider-wide)
PATTERN_RECOMMENDATION not_applicable
WORKSPACE_DEFAULT rejected → sonnet (provider_unavailable, provider-wide)
GLOBAL_DEFAULT rejected → haiku (provider_unavailable, provider-wide)
Hard failure. Turn does not start.
TUI:
No model available for this turn.
anthropic provider currently unavailable.
Tried: opus, sonnet, haiku — all on anthropic.
Run /model <id> to choose a model from a configured provider, or wait for recovery.
(The TUI message dynamically lists alternative configured providers; if none exist, it omits the suggestion.)
turn: estimated_input_tokens = 90000, has_images = true
rules: [{name: "long context", when: {estimated_input_tokens_gt: 80000}, use: haiku}]
(haiku has 200k context but no vision support)
workspace_default: opus
Chain:
CONFIGURED_RULES rejected → haiku (no_vision_support)
(rule matched but candidate failed validation)
WORKSPACE_DEFAULT chose → opus (supports images, 200k context)
session.active_model = None
rules: [{name: "fast for commits", when: {message_matches: "^/commit"}, use: haiku}]
pattern store: sonnet at confidence 0.87, sample 23
pattern_disagreement.surface = true
user: "/commit fix the auth bug"
Chain:
CONFIGURED_RULES chose → anthropic:claude-haiku-4-5 (rule "fast for commits")
PATTERN_RECOMMENDATION deferred → anthropic:claude-sonnet-4-6
(confidence 0.87, 23 samples — outranked by rule)
TUI:
→ Routing to anthropic:claude-haiku-4-5 per rule "fast for commits"
Pattern store suggests anthropic:claude-sonnet-4-6 (confidence 0.87, 23 tasks)
/route override to use Sonnet for this turn
/route ignore to dismiss
Turn start at T=0:
user: "Read README.md and summarize"
Chain: CONFIGURED_RULES → sonnet (rule "balanced for reads")
Lock: sonnet for entire turn.
T=1.2s: sonnet emits tool_use (read_file)
T=1.5s: tool dispatcher returns TOOL message with file content
T=1.6s: LLM call #2 (still sonnet, lock holds): emits tool_use (read_file for table of contents)
T=2.0s: TOOL message
T=2.1s: LLM call #3 (still sonnet): emits final summary, stop_reason=end_turn
Turn ends.
route.decided event has chain trace from T=0.
No re-routing happens between LLM calls #1, #2, #3.
/model swap is queuedTurn N in flight. User runs `/model opus` while sonnet is mid-tool-loop.
Server response (TUI banner): "Model swap pending: anthropic:claude-opus-4-7. Applies to next turn."
Turn N completes on sonnet (lock holds).
Turn N+1 starts:
Chain: MANUAL_STICKY → opus (newly set)
Active planner model: anthropic:claude-opus-4-7
Planner emits: delegate(tier="fast", task="rename `foo` to `bar` in src/", context={"mode": "minimal"})
Worker session creation triggers routing:
Chain (in worker context):
PER_MESSAGE_OVERRIDE not_applicable
MANUAL_STICKY not_applicable
CONFIGURED_RULES not_applicable (rules evaluated against worker's "user message" = task brief)
PATTERN_RECOMMENDATION not_applicable (insufficient context)
DELEGATE_REQUEST chose → anthropic:claude-haiku-4-5 (tier=fast resolves)
Worker runs to completion on haiku. Returns to planner via delegate tool result.
Planner's lock on opus is preserved through the delegation.
rules:
- name: "deep for architecture"
when: {message_matches: "architecture"}
use: opus
- name: "budget cap"
when: {cost_today_exceeds_usd: 5.00}
use: haiku
cost_today = $5.42
user: "Walk me through the architecture..."
Chain:
CONFIGURED_RULES chose → opus (rule "deep for architecture" matched first)
This is wrong if the user wants the budget cap to win. Reorder:
rules:
- name: "budget cap"
when: {cost_today_exceeds_usd: 5.00}
use: haiku
- name: "deep for architecture"
when: {message_matches: "architecture"}
use: opus
Now:
CONFIGURED_RULES chose → haiku (rule "budget cap" matched first)
/rules check can detect obvious shadowing (an earlier rule’s predicates strictly subsume a later rule’s), but rule ordering remains the user’s contract.
| Command | Effect |
|---|---|
/model <id> |
Set session sticky model (opt-out from rules). |
/model - |
Clear sticky; next turn uses default policy chain. |
/model show |
Print active model and the last turn’s full chain trace. |
/route override |
(When pattern-disagreement is surfaced) use the pattern’s choice. |
/route ignore |
(When pattern-disagreement is surfaced) dismiss the suggestion. |
/rules check |
Validate the routing.yaml file; print errors or “ok”. |
/rules show |
Print the active rule list (post-validation, with synthetic names). |
/rules reload |
Force re-read of routing.yaml (normally automatic). |
/cost |
Print this session’s cost broken down by model and role. |
A user message starting with @<alias> (e.g., @haiku, @opus, @sonnet) is parsed as a per-message override. The token is stripped before sending to the model. Aliases are resolved against the alias table; unknown aliases produce an inline error and the turn does not start.
Aliases live in the model registry config (alongside tier and can_delegate per §6.8). Each model entry can declare any number of aliases:
models:
anthropic:claude-haiku-4-5:
tier: fast
can_delegate: false
aliases: [haiku, fast]
anthropic:claude-sonnet-4-6:
tier: balanced
can_delegate: true
aliases: [sonnet, balanced]
anthropic:claude-opus-4-7:
tier: deep
can_delegate: true
aliases: [opus, deep]
Aliases must be unique across the registry; duplicates are rejected at validation.
The override syntax must be at the start of the message and followed by whitespace. Email me @haiku tomorrow is not an override.
Edge case: a user wanting to ask the agent about the literal string @haiku at message start can prefix with a backslash (\@haiku). The backslash is stripped; no override is applied.
/model swap is queued, not applied.route.decided records the rejection.provider_unavailable./route override and /route ignore.delegate(tier=...); verify the worker uses the resolved model.delegate tool registered.usage.cost_usd.route.decided completeness. Every turn emits exactly one route.decided event with chain length equal to the number of policies that actually ran.supports_tools=false. Verify rejection with no_tool_support and chain continues.anthropic provider is marked Unavailable on the next routing decision.delegate returns no_model_available_for_tier.fast is rejected by /rules check.memory_add; the tool is not registered; the call fails as “unknown tool” or equivalent.insufficient_context shape. Worker returns a structured InsufficientContextRequest; the planner’s delegate.failed event payload validates against the schema./model swaps. User runs /model A then /model B within one turn; verify only B is applied at the next turn boundary, and the banner reflects B.Worth investing in for two specific properties:
any_of/all_of/not over the predicate set always evaluate to a bool, never raise.Tracked here, deferred to later revisions:
delegate(tier="fast") could consult patterns to pick among fast-tier models. v1: configured fast model is used./rules check shadow detection. v1 prints rules; v2 may detect when one rule strictly shadows another and warn.| Date | Decision | Rationale |
|---|---|---|
| 2026-05-08 | Turn-locked model; re-routing only at turn boundaries | Cost predictability; behavioral predictability; reasoning continuity within a turn. |
| 2026-05-08 | Mid-turn /model swap is queued, not applied |
Less disruptive than cancelling the in-flight turn. |
| 2026-05-08 | First-match-wins for rules; user orders the list | Predictable. Specificity ordering is a debugging nightmare. |
| 2026-05-08 | User-set policy beats pattern recommendations by default | Trust. Silent override of user rules destroys confidence in the engine. |
| 2026-05-08 | Pattern disagreement surfaced as suggestion (Phase 3 opt-in) | Lets the system improve on user rules without overriding them. |
| 2026-05-08 | Capability validation per-policy, fall through on failure | Routing is resilient: a bad rule choice doesn’t kill the turn. |
| 2026-05-08 | Provider availability tracked at adapter; routing rejects Unavailable | Outages produce graceful fallthrough, not turn failures. |
| 2026-05-08 | No per-rule fallback lists; chain fallthrough is the answer | Per-rule fallback breaks linearity of “why this model?” debugging. |
| 2026-05-08 | Hard failure when no policy succeeds; surface to user | Never silently use a model the user didn’t authorize. |
| 2026-05-08 | Closed predicate set, no DSL | Predicates are contracts; users wanting more write skills, not rules. |
| 2026-05-08 | cost_today_exceeds_usd first-class predicate |
Budget enforcement is the most common reason users add rules. |
| 2026-05-08 | Delegation context: minimal and explicit only; no auto |
Forces planners to think about handoff; produces more reliable results than guessing. |
| 2026-05-08 | Workers cannot delegate | Prevents fan-out cost explosions and recursion. |
| 2026-05-08 | Tiers as abstraction over concrete models | Decouples planner reasoning from provider choices. |
| 2026-05-08 | Hot reload on every turn | Cheap; eliminates “I edited the file but nothing changed” frustration. |
| 2026-05-08 | Pattern recommendations require min sample size and min confidence | Cold start safety; prevents routing on noise. |
| 2026-05-08 | cost_weight configurable per workspace |
Cost/quality tradeoff differs by workflow; defaulting to a hidden tradeoff is wrong. |
| 2026-05-08 | One canonical route.decided event per turn |
Single record answers “why this model?” without joins; full chain trace is atomic. |
| 2026-05-08 | Availability tracked at (provider, model) with provider-wide promotion |
Single-model outages don’t blackout a provider; auth/DNS errors do. |
| 2026-05-08 | Capability validation extended to tools, system prompt, structured output | Required for Ollama and other limited models; “only require what we’ll use” prevents spurious rejections. |
| 2026-05-08 | Predicate skills_loaded_includes renamed to skills_matching_message_includes |
Routing runs before context assembly; “loaded” was a lie. |
| 2026-05-08 | Workspace tiers must define all three slots or be absent |
Partial tier maps cause runtime no_model_available_for_tier for non-obvious reasons; fail at config time. |
| 2026-05-08 | Workers are read-only against memory and skill state | Planner has the broader context; sub-tasks shouldn’t mutate durable state. |
| 2026-05-08 | Worker sessions hidden from /history by default |
A planner can spawn many workers; flat listing clutters the user’s view. |
| 2026-05-08 | insufficient_context returns a structured InsufficientContextRequest |
Planner can programmatically retry with targeted references rather than re-prompting itself. |
| 2026-05-08 | Cost-efficiency degenerate case (all costs equal) zeros the term | Decision falls cleanly to pure quality when there’s no cost differentiation. |
| 2026-05-14 | cost_weight default lowered from 0.3 → 0.1 |
§A3-rev showed 0.3 required a ~0.43 quality delta to flip the chooser, swamping the 0.15–0.30 cluster-level deltas the LLM judge actually produces. 0.1 needs ~0.143, which observed deltas clear. |
| 2026-05-15 | cost_weight default lowered from 0.1 → 0.05 |
§A3-rev5 showed cost_efficiency normalizes per cluster to [0.0, 1.0], so cw=0.1 added a flat +0.10 floor to whichever model was cheapest, swamping the 0.05–0.09 quality deltas observed on regex-with-edge-cases (haiku 0.91 / sonnet 1.00) and fix-a-bug-small. cw=0.05 halves the floor; direct simulation showed 6 sonnet picks pass the gate where cw=0.1 produced 0; haiku-correct decisions on workloads with q-delta ≥0.1 still pick haiku. |
| 2026-05-08 | Tier upgrade exhausts at deep; no escape above it |
Explicit failure mode rather than implicit infinite loop. |
| 2026-05-08 | Pattern override emits route.overridden, not pattern.override_accepted |
Aligns with event-bus catalog; preserves one-route.decided-per-turn invariant. |
| 2026-05-08 | Delegation slot in Phase 1; delegate() tool in Phase 4 |
Chain shape is fixed; fills stub later rather than refactoring the pipeline. |
canonical-message-format.md — Message, ToolDefinition, AdapterCapabilities.event-bus-and-trace-catalog.md — payload shape for route.decided and routing auxiliaries.streaming-protocol.md (planned) — turn lifecycle events; how delegation result streaming works (or doesn’t, in v1).