Writing Policy Rules¶

Policy rules live in config/policy.yaml. They are evaluated on every request to decide which model profile to use.

File structure¶

version: "1"
fallback_profile: "default"

rules:
  - name: my_rule
    priority: 50
    select_profile: capable
    description: Optional explanation.
    when:
      <condition_key>: <value>

experiments: []

Top-level fields¶

Field	Type	Required	Description
`version`	string	no	Schema version. Currently `"1"`.
`fallback_profile`	string	yes	Profile used when no rule matches.
`rules`	list	yes	Ordered list of rule objects.
`experiments`	list	no	A/B experiment definitions. Empty by default.

Rule fields¶

Field	Type	Required	Description
`name`	string	yes	Unique identifier used in logs and decision records. Duplicate names cause a startup error.
`priority`	integer	no	Evaluation order. Lower = evaluated first. Default: `100`.
`select_profile`	string	yes	The profile name to select when this rule matches.
`when`	dict	no	Key/value pairs that must all match. Empty = always matches.
`description`	string	no	Human-readable explanation.

Rule evaluation order¶

Rules are sorted by priority ascending (lowest first).
All when conditions must match (logical AND).
First matching rule is used — remaining rules are skipped.
If X-SwitchBoard-Profile header is present, all rules are skipped.
If no rule matches, fallback_profile is used with rule_name = "fallback".

After a rule matches, these checks still apply in order: - A/B experiment: the matched profile may be redirected to a treatment profile - Adaptive routing: demoted profiles are bypassed for an alternative - Eligibility: profiles that cannot meet capability requirements are skipped - Scoring: remaining eligible profiles are ranked by quality/cost/latency

Condition keys¶

Task and content¶

Key	Type	Description
`task_type`	string or list	Inferred task: `"code"`, `"analysis"`, `"planning"`, `"summarization"`, `"chat"`
`complexity`	string or list	`"low"` (≤500 tokens, ≤3 msgs), `"medium"` (≤3k tokens, ≤8 msgs), `"high"` (>3k tokens, >8 msgs, or tools present)
`stream`	boolean	Whether the caller requested SSE streaming
`tools_present`	boolean	Whether the request includes a `tools` array
`requires_tools`	boolean	Same as `tools_present` — alias used by the eligibility check
`requires_long_context`	boolean	`true` when estimated tokens > 6,000
`requires_structured_output`	boolean	`true` when `response_format.type` is `json_object` or `json_schema`

Token counts¶

Key	Semantics
`min_estimated_tokens`	`estimated_tokens >= value`
`max_estimated_tokens`	`estimated_tokens <= value`
`min_max_tokens`	requested output tokens `>= value`
`max_max_tokens`	requested output tokens `<= value`

Caller signals¶

Key	Source	Example
`model_hint`	`model` field in request body	`model_hint: ["capable", "gpt-4o"]`
`priority`	`X-SwitchBoard-Priority` header	`priority: "high"`
`tenant_id`	`X-SwitchBoard-Tenant-ID` header	`tenant_id: "acme"`
`cost_sensitivity`	`X-SwitchBoard-Cost-Sensitivity` header	`cost_sensitivity: "high"` or `cost_sensitivity: "low"`
`latency_sensitivity`	`X-SwitchBoard-Latency-Sensitivity` header	`latency_sensitivity: "high"` or `latency_sensitivity: "low"`

List values ("any of")¶

When a condition value is a list, it matches if the context attribute equals any element:

when:
  task_type: ["code", "planning"]
  model_hint: ["gpt-4o", "capable", "claude-3-5-sonnet"]

A/B experiments¶

Experiments are defined alongside rules and intercept matching traffic after rule evaluation.

experiments:
  - name: capable_vs_fast_chat
    profile_a: fast          # control — majority of traffic
    profile_b: capable       # treatment — split_percent % of traffic
    split_percent: 10
    enabled: true
    applies_to_rules:
      - default_short_request   # only intercept this rule; empty = all rules

Field	Type	Required	Description
`name`	string	yes	Unique identifier recorded in decision logs.
`profile_a`	string	yes	Control profile. Must differ from `profile_b`.
`profile_b`	string	yes	Treatment profile.
`split_percent`	integer	yes	Percentage routed to `profile_b` (0–100).
`enabled`	boolean	no	Set to `false` to pause without removing the config. Default: `true`.
`applies_to_rules`	list	no	Restrict to specific rule names. Empty = all rules.

Assignment is deterministic: the same X-Request-ID always lands in the same bucket.

Example rules¶

Route code tasks to the capable model¶

- name: coding_task
  priority: 35
  select_profile: capable
  when:
    task_type: "code"

Route large context to capable¶

- name: large_context
  priority: 50
  select_profile: capable
  when:
    min_estimated_tokens: 4096

Keep cost-sensitive short requests on the fast model¶

- name: cost_sensitive
  priority: 65
  select_profile: fast
  when:
    cost_sensitivity: "high"
    complexity: ["low", "medium"]

Route a specific tenant to local¶

- name: tenant_batch_local
  priority: 25
  select_profile: local
  when:
    tenant_id: "internal-batch"
    priority: "low"

Catch-all fast path¶

- name: default_short
  priority: 100
  select_profile: fast
  when:
    max_estimated_tokens: 4096

Troubleshooting¶

Check GET /admin/decisions/recent to see which rule triggered for recent requests. Each decision record includes rule_name — if it shows "fallback", no rule matched.

See troubleshooting.md for more.