Bandit-Based, Budget-Aware LLM Routing with Preference-Informed LinUCB (PILOT)

The authors propose adaptive LLM routing under budget constraints by framing it as a contextual bandit problem. They learn a shared query–LLM embedding space from human preferences and refine it online, implementing routing with PILOT, a preference-informed LinUCB variant. An online knapsack-based cost policy enforces budgets, enabling practical, efficient routing without exhaustive labels or multi-model inference.

Key Points

Reframes LLM routing as a contextual bandit problem to learn adaptively from bandit feedback rather than relying on exhaustive supervised labels.
Builds a shared embedding space aligning queries and LLMs to model their affinity, initialized from offline human preference data and refined online.
Introduces PILOT, a preference-informed extension of LinUCB, to operationalize adaptive routing decisions.
Adds a budget-aware online cost policy modeled as a multi-choice knapsack problem to respect diverse user and system cost constraints.
Aims to reduce unnecessary multi-model inference while adapting to evolving query distributions in practical deployments.

Sentiment

Mixed to mildly skeptical: readers accept the economic logic of budget-aware routing and bandit learning, but question its real-world primacy, measurement rigor, consistency, novelty, and safety/privacy implications.

In Agreement

Large cost differentials (often 10–100x) between models make adaptive routing economically compelling even with some routing errors.
Price-per-token alone is insufficient; routing must account for tokens-per-interaction, especially for reasoning/thinking modes that can multiply token usage, aligning with the paper’s budget-aware design.
Routers can help select cheaper models most of the time and reserve expensive models for harder cases, matching the paper’s exploration–exploitation framing.
Workload-specific tuning (and even retrieval-augmented routing) is important to realize real-world gains, consistent with the paper’s online learning approach.
Prompt rewriting plus routing might further enable cheaper model selection under cost constraints, supporting the paper’s practical, budget-aware angle.

Opposed

In many organizations, LLM API cost isn’t the top priority (other modalities cost more), so sophisticated routing may solve a lower-priority problem.
Consistency can matter more than marginal performance/cost gains; adaptive routing may introduce variability undesirable for non-trivial applications.
Human preference priors may be unnecessary; an LLM could self-assess question difficulty—though others dispute that models can reliably anticipate complexity.
The contribution may not be frontier; router papers abound, arXiv isn’t peer-reviewed, and benchmark results (e.g., RouterBench) can be stale or unrepresentative.
Reliability and safety concerns (persuasive hallucinations) suggest that optimizing for cost/performance alone is risky without strong correctness guarantees.
Some argue cheaper, newer base models already outperform expensive legacy models, reducing the need for complex routers.