Bandit-Based, Budget-Aware LLM Routing with Preference-Informed LinUCB (PILOT)

Read Articleadded Sep 1, 2025

The authors propose adaptive LLM routing under budget constraints by framing it as a contextual bandit problem. They learn a shared query–LLM embedding space from human preferences and refine it online, implementing routing with PILOT, a preference-informed LinUCB variant. An online knapsack-based cost policy enforces budgets, enabling practical, efficient routing without exhaustive labels or multi-model inference.

Key Points

  • Reframes LLM routing as a contextual bandit problem to learn adaptively from bandit feedback rather than relying on exhaustive supervised labels.
  • Builds a shared embedding space aligning queries and LLMs to model their affinity, initialized from offline human preference data and refined online.
  • Introduces PILOT, a preference-informed extension of LinUCB, to operationalize adaptive routing decisions.
  • Adds a budget-aware online cost policy modeled as a multi-choice knapsack problem to respect diverse user and system cost constraints.
  • Aims to reduce unnecessary multi-model inference while adapting to evolving query distributions in practical deployments.

Sentiment

Mixed to mildly skeptical: readers accept the economic logic of budget-aware routing and bandit learning, but question its real-world primacy, measurement rigor, consistency, novelty, and safety/privacy implications.

In Agreement

  • Large cost differentials (often 10–100x) between models make adaptive routing economically compelling even with some routing errors.
  • Price-per-token alone is insufficient; routing must account for tokens-per-interaction, especially for reasoning/thinking modes that can multiply token usage, aligning with the paper’s budget-aware design.
  • Routers can help select cheaper models most of the time and reserve expensive models for harder cases, matching the paper’s exploration–exploitation framing.
  • Workload-specific tuning (and even retrieval-augmented routing) is important to realize real-world gains, consistent with the paper’s online learning approach.
  • Prompt rewriting plus routing might further enable cheaper model selection under cost constraints, supporting the paper’s practical, budget-aware angle.

Opposed

  • In many organizations, LLM API cost isn’t the top priority (other modalities cost more), so sophisticated routing may solve a lower-priority problem.
  • Consistency can matter more than marginal performance/cost gains; adaptive routing may introduce variability undesirable for non-trivial applications.
  • Human preference priors may be unnecessary; an LLM could self-assess question difficulty—though others dispute that models can reliably anticipate complexity.
  • The contribution may not be frontier; router papers abound, arXiv isn’t peer-reviewed, and benchmark results (e.g., RouterBench) can be stale or unrepresentative.
  • Reliability and safety concerns (persuasive hallucinations) suggest that optimizing for cost/performance alone is risky without strong correctness guarantees.
  • Some argue cheaper, newer base models already outperform expensive legacy models, reducing the need for complex routers.
Bandit-Based, Budget-Aware LLM Routing with Preference-Informed LinUCB (PILOT)