Research

The research behind NEXUS OS

Technical reports on adaptive routing, agent reputation, automated quality gating, and the architecture decisions that keep NEXUS OS fast, cheap, and safe in production.

6

Published Reports

219

Calibration Sessions

74%

Cost Reduction (CORTEX)

100%

MergeGate Pass Rate

RoutingMay 29, 2026

Adaptive Model Routing via UCB1 Multi-Armed Bandits in Multi-Agent Code Generation

NEXUS Research Team

We frame model-tier selection (haiku / sonnet / opus) for code-generation subtasks as a multi-armed bandit problem and apply UCB1 to balance exploration of underused tiers against exploitation of historically high-quality, low-cost tiers. Across 219 calibration sessions, UCB1 routing reduced API spend by 74% relative to an opus-only baseline while keeping MergeGate composite scores within 2.1 points of the opus-only ceiling. We detail the reward signal construction (composite quality score minus normalized cost), cold-start handling for new agent specializations, and the per-task complexity features fed into the arm-selection policy.

CORTEXUCB1Cost OptimizationBandits
Download PDF
Agent ArchitectureMay 13, 2026

EWMA-Based Reputation Tracking for Heterogeneous Agent Fleets

NEXUS Research Team

Agent fleets in production accumulate hundreds of completed tasks per specialization per week, but raw historical averages are slow to react to regressions (e.g. a skill update that degrades quality) and overly sensitive to single outlier failures. We introduce an exponentially-weighted moving average (EWMA) reputation score per (agent type, skill) pair, updated after every MergeGate evaluation, and show how it feeds back into CORTEX's UCB1 reward estimates. We compare decay constants (alpha = 0.05 to 0.30) and find alpha = 0.12 minimizes false-positive routing penalties while still detecting a synthetic quality regression within 6 tasks on average.

EWMAReputationQualityCORTEX
Download PDF
Quality GatesApril 21, 2026

A Five-Axis Composite Quality Score for Automated Merge Decisions

NEXUS Research Team

MergeGate scores every AI-generated delivery on a weighted composite of five axes: test pass rate (30%), security posture (25%), efficiency (20%), self-correction count (15%, inverted), and constitutional-safety adherence (10%). We describe the scoring functions for each axis, including the static-analysis ruleset used for the security axis (OWASP Top 10 plus injection-class CWEs) and the AutoFix loop triggered when composite score falls below the 75/100 merge threshold. On a held-out validation set of 22 production pull requests across 8 codebases, the composite score correlated with human reviewer accept/reject decisions at r = 0.81, and AutoFix resolved 1 of 1 flagged command-injection findings to zero on first pass.

MergeGateQuality ScoreAutoFixSecurity
Download PDF
ArchitectureMarch 17, 2026

CORTEX: A Production Architecture for Adaptive Multi-Agent Orchestration

NEXUS Research Team

We present CORTEX, the routing and orchestration layer underlying NEXUS OS. CORTEX combines UCB1-based model-tier selection, EWMA agent reputation tracking, a security gate that vetoes routing decisions for tasks matching high-risk patterns (auth, payments, infra-as-code), and a spawn queue that bounds concurrent agent instantiation under budget constraints. We report on a phased rollout (Phase 1-4) validated against a live toggle-feature delivery, achieving a MergeGate composite score of 100/100 across two independent end-to-end test runs (11/11 and 22/22 tasks) spanning frontend, database, and E2E smoke-test ComponentAgent types.

CORTEXArchitectureOrchestrationProduction
Download PDF
EfficiencyFebruary 8, 2026

APEX: Context Compression for Long-Horizon Agent Sessions

NEXUS Research Team

Long-running agent sessions accumulate tool outputs, file reads, and intermediate reasoning that quickly exceed useful context windows. APEX is a compression layer that summarizes superseded tool results, deduplicates repeated file reads (the 'read-dedup' optimization), and prunes resolved sub-task traces while preserving citations back to original sources. Across a sample of production sessions, APEX reduced effective token usage by 40-70% with no measurable drop in MergeGate composite scores, and the read-dedup optimization alone accounted for roughly a third of total savings in file-heavy refactoring tasks.

APEXCompressionBudget GovernorEfficiency
Download PDF
SafetyJanuary 20, 2026

Constitutional Safety Gates for Autonomous Code-Modifying Agents

NEXUS Research Team

As agent autonomy increases, so does the blast radius of a single bad decision — particularly for tasks touching authentication, payments, or infrastructure-as-code. We describe the constitutional-safety axis of MergeGate's composite score, the static rule set used to flag high-risk diffs before they reach the scoring pipeline, and the human-in-the-loop escalation path for tasks the security gate marks REVIEW_REQUIRED rather than auto-merging. In an internal validation pilot covering 9 isolated test universes, the security gate correctly flagged 7 of 7 seeded vulnerability classes (including hardcoded secrets and command injection) before merge.

SafetyConstitutional AISecurity GateMergeGate
Download PDF

Want the implementation details?

Most of what's described in these reports ships as open-source components. Read the architecture docs or browse the source.