Adaptive Model Routing via UCB1 Multi-Armed Bandits in Multi-Agent Code Generation
NEXUS Research Team
We frame model-tier selection (haiku / sonnet / opus) for code-generation subtasks as a multi-armed bandit problem and apply UCB1 to balance exploration of underused tiers against exploitation of historically high-quality, low-cost tiers. Across 219 calibration sessions, UCB1 routing reduced API spend by 74% relative to an opus-only baseline while keeping MergeGate composite scores within 2.1 points of the opus-only ceiling. We detail the reward signal construction (composite quality score minus normalized cost), cold-start handling for new agent specializations, and the per-task complexity features fed into the arm-selection policy.