Observability
Analytics & Observability
Every agent, every task, every token — measured in real time. NEXUS exposes the same composite quality signal it uses internally to route work, so you can see exactly why a model tier was chosen and how it performed.
58.26
Avg Composite Quality (Q)
across 4-universe validation pilot
100%
MergeGate Pass Rate
on live toggle-feature pilot (100/100)
74%
Cost Reduction KPI
vs. baseline, via cost_router (daily cron)
219
Sessions Calibrated
quality score baselines established
Composite Quality Score Q
Every dispatched task is scored on a 0–100 scale from five weighted signals. This is the same Q value MergeGate checks before allowing a merge, and the same value CORTEX's EWMA reputation tracker uses to update each agent role.
Tests pass rate
30%
Ratio of automated tests passing after the agent's change.
Security score
25%
1 − normalized findings from the SecurityScanner gate (secrets, injection, XSS, deps).
Token efficiency
20%
Output value per token spent, after APEX compression (40–70% reduction).
Self-correction rate
15%
Bonus for agents that detect and fix their own mistakes mid-task.
Constitution adherence
10%
Compliance with NEXUS's safety and behavioral constitution checks.
Quality Score Formula
Q = 0.30 × tests_pass_rate
+ 0.25 × (1 − security_findings_norm)
+ 0.20 × token_efficiency
+ 0.15 × self_correction_rate
+ 0.10 × constitution_score
Q ≥ 75 → MergeGate PASS
Q < 75 → AutoFix loop triggered (bounded retries)
Q < 60 → BLOCK, human escalationEWMA Reputation per Agent Role
Each agent role carries an exponentially-weighted moving average (α = 0.1) of its recent Q scores. CORTEX's UCB1 router reads this reputation directly when deciding which agent — and which model tier — gets the next task.
| Agent Role | Quality Score | EWMA (α=0.1) | Trend | Tasks Run | Tier |
|---|---|---|---|---|---|
| Backend ComponentAgent | 0.91 | up | 312 | sonnet | |
| Frontend ComponentAgent | 0.88 | up | 287 | sonnet | |
| Security SpecialistAgent | 0.94 | stable | 156 | opus | |
| DB Migration SubAgent | 0.86 | up | 204 | haiku | |
| E2E Test AtomicAgent | 0.79 | stable | 168 | sonnet | |
| DevOps MicroAgent | 0.83 | down | 97 | haiku |
Cost & Token Tracking
The Budget Governor records per-task and per-session spend down to the token, broken out by model tier. APEX compression shaves 40–70% off context size before it ever hits the model.
| Tier | Model | Cost / 1k tokens | Avg Tokens / Task | Share of Tasks |
|---|---|---|---|---|
| haiku | claude-3-5-haiku | $0.0008 | ~3,200 | 42% |
| sonnet | claude-sonnet-4 | $0.0030 | ~9,800 | 47% |
| opus | claude-opus-4 | $0.0150 | ~18,400 | 11% |
UCB1 Adaptive Routing
Exploration vs. exploitation, visualized
CORTEX treats each model tier as an arm of a multi-armed bandit. The score below balances a tier's historical EWMA quality against an exploration bonus that shrinks as more decisions are routed through it — so under-used tiers still get sampled occasionally to keep the reputation data fresh.
score(tier) = Q̄(tier) + C × √(ln(N) / n(tier))
Q̄(tier) — EWMA quality (α=0.1) for this tier
C — exploration constant (default 1.4)
N — total routing decisions so far
n(tier) — decisions routed to this tier
route_tier(complexity):
candidates = [haiku, sonnet, opus]
chosen = argmax(score(tier) for tier in candidates
if meets_safety_floor(tier, complexity))haiku
sonnet
opus