Observability

Analytics & Observability

Every agent, every task, every token — measured in real time. NEXUS exposes the same composite quality signal it uses internally to route work, so you can see exactly why a model tier was chosen and how it performed.

58.26

Avg Composite Quality (Q)

across 4-universe validation pilot

100%

MergeGate Pass Rate

on live toggle-feature pilot (100/100)

74%

Cost Reduction KPI

vs. baseline, via cost_router (daily cron)

219

Sessions Calibrated

quality score baselines established

Composite Quality Score Q

Every dispatched task is scored on a 0–100 scale from five weighted signals. This is the same Q value MergeGate checks before allowing a merge, and the same value CORTEX's EWMA reputation tracker uses to update each agent role.

Tests pass rate

30%

Ratio of automated tests passing after the agent's change.

Security score

25%

1 − normalized findings from the SecurityScanner gate (secrets, injection, XSS, deps).

Token efficiency

20%

Output value per token spent, after APEX compression (40–70% reduction).

Self-correction rate

15%

Bonus for agents that detect and fix their own mistakes mid-task.

Constitution adherence

10%

Compliance with NEXUS's safety and behavioral constitution checks.

Quality Score Formula

Q = 0.30 × tests_pass_rate
  + 0.25 × (1 − security_findings_norm)
  + 0.20 × token_efficiency
  + 0.15 × self_correction_rate
  + 0.10 × constitution_score

Q ≥ 75  → MergeGate PASS
Q < 75  → AutoFix loop triggered (bounded retries)
Q < 60  → BLOCK, human escalation

EWMA Reputation per Agent Role

Each agent role carries an exponentially-weighted moving average (α = 0.1) of its recent Q scores. CORTEX's UCB1 router reads this reputation directly when deciding which agent — and which model tier — gets the next task.

Agent RoleQuality ScoreEWMA (α=0.1)Tier
Backend ComponentAgent
0.91sonnet
Frontend ComponentAgent
0.88sonnet
Security SpecialistAgent
0.94opus
DB Migration SubAgent
0.86haiku
E2E Test AtomicAgent
0.79sonnet
DevOps MicroAgent
0.83haiku

Cost & Token Tracking

The Budget Governor records per-task and per-session spend down to the token, broken out by model tier. APEX compression shaves 40–70% off context size before it ever hits the model.

TierModelCost / 1k tokensShare of Tasks
haikuclaude-3-5-haiku$0.0008
42%
sonnetclaude-sonnet-4$0.0030
47%
opusclaude-opus-4$0.0150
11%

UCB1 Adaptive Routing

Exploration vs. exploitation, visualized

CORTEX treats each model tier as an arm of a multi-armed bandit. The score below balances a tier's historical EWMA quality against an exploration bonus that shrinks as more decisions are routed through it — so under-used tiers still get sampled occasionally to keep the reputation data fresh.

score(tier) = Q̄(tier) + C × √(ln(N) / n(tier))

  Q̄(tier)  — EWMA quality (α=0.1) for this tier
  C        — exploration constant (default 1.4)
  N        — total routing decisions so far
  n(tier)  — decisions routed to this tier

route_tier(complexity):
  candidates = [haiku, sonnet, opus]
  chosen = argmax(score(tier) for tier in candidates
                   if meets_safety_floor(tier, complexity))

haiku

Exploit 78%Explore 22%

sonnet

Exploit 88%Explore 12%

opus

Exploit 95%Explore 5%