Observability

Analytics & Observability

Every agent, every task, every token — measured in real time. NEXUS exposes the same composite quality signal it uses internally to route work, so you can see exactly why a model tier was chosen and how it performed.

58.26

Avg Composite Quality (Q)

across 4-universe validation pilot

100%

MergeGate Pass Rate

on live toggle-feature pilot (100/100)

74%

Cost Reduction KPI

vs. baseline, via cost_router (daily cron)

219

Sessions Calibrated

quality score baselines established

Composite Quality Score Q

Every dispatched task is scored on a 0–100 scale from five weighted signals. This is the same Q value MergeGate checks before allowing a merge, and the same value CORTEX's EWMA reputation tracker uses to update each agent role.

Tests pass rate

30%

Ratio of automated tests passing after the agent's change.

Security score

25%

1 − normalized findings from the SecurityScanner gate (secrets, injection, XSS, deps).

Token efficiency

20%

Output value per token spent, after APEX compression (40–70% reduction).

Self-correction rate

15%

Bonus for agents that detect and fix their own mistakes mid-task.

Constitution adherence

10%

Compliance with NEXUS's safety and behavioral constitution checks.

Quality Score Formula

Q = 0.30 × tests_pass_rate
  + 0.25 × (1 − security_findings_norm)
  + 0.20 × token_efficiency
  + 0.15 × self_correction_rate
  + 0.10 × constitution_score

Q ≥ 75  → MergeGate PASS
Q < 75  → AutoFix loop triggered (bounded retries)
Q < 60  → BLOCK, human escalation

EWMA Reputation per Agent Role

Each agent role carries an exponentially-weighted moving average (α = 0.1) of its recent Q scores. CORTEX's UCB1 router reads this reputation directly when deciding which agent — and which model tier — gets the next task.

Agent Role	EWMA (α=0.1)	Trend	Tasks Run	Tier
Backend ComponentAgent	0.91	up	312	sonnet
Frontend ComponentAgent	0.88	up	287	sonnet
Security SpecialistAgent	0.94	stable	156	opus
DB Migration SubAgent	0.86	up	204	haiku
E2E Test AtomicAgent	0.79	stable	168	sonnet
DevOps MicroAgent	0.83	down	97	haiku

Cost & Token Tracking

The Budget Governor records per-task and per-session spend down to the token, broken out by model tier. APEX compression shaves 40–70% off context size before it ever hits the model.

Tier	Model	Cost / 1k tokens	Avg Tokens / Task	Share of Tasks
haiku	claude-3-5-haiku	$0.0008	~3,200	42%
sonnet	claude-sonnet-4	$0.0030	~9,800	47%
opus	claude-opus-4	$0.0150	~18,400	11%

UCB1 Adaptive Routing

Exploration vs. exploitation, visualized

CORTEX treats each model tier as an arm of a multi-armed bandit. The score below balances a tier's historical EWMA quality against an exploration bonus that shrinks as more decisions are routed through it — so under-used tiers still get sampled occasionally to keep the reputation data fresh.

score(tier) = Q̄(tier) + C × √(ln(N) / n(tier))

  Q̄(tier)  — EWMA quality (α=0.1) for this tier
  C        — exploration constant (default 1.4)
  N        — total routing decisions so far
  n(tier)  — decisions routed to this tier

route_tier(complexity):
  candidates = [haiku, sonnet, opus]
  chosen = argmax(score(tier) for tier in candidates
                   if meets_safety_floor(tier, complexity))

haiku

Exploit 78%Explore 22%

sonnet

Exploit 88%Explore 12%

opus

Exploit 95%Explore 5%