Bandit-Based, Budget-Aware LLM Routing with Preference-Informed LinUCB (PILOT)
Read ArticleRead Original Articleadded Sep 1, 2025September 1, 2025
The authors propose adaptive LLM routing under budget constraints by framing it as a contextual bandit problem. They learn a shared query–LLM embedding space from human preferences and refine it online, implementing routing with PILOT, a preference-informed LinUCB variant. An online knapsack-based cost policy enforces budgets, enabling practical, efficient routing without exhaustive labels or multi-model inference.
Key Points
- Reframes LLM routing as a contextual bandit problem to learn adaptively from bandit feedback rather than relying on exhaustive supervised labels.
- Builds a shared embedding space aligning queries and LLMs to model their affinity, initialized from offline human preference data and refined online.
- Introduces PILOT, a preference-informed extension of LinUCB, to operationalize adaptive routing decisions.
- Adds a budget-aware online cost policy modeled as a multi-choice knapsack problem to respect diverse user and system cost constraints.
- Aims to reduce unnecessary multi-model inference while adapting to evolving query distributions in practical deployments.
Sentiment
Mixed to mildly skeptical: readers accept the economic logic of budget-aware routing and bandit learning, but question its real-world primacy, measurement rigor, consistency, novelty, and safety/privacy implications.
In Agreement
- Large cost differentials (often 10–100x) between models make adaptive routing economically compelling even with some routing errors.
- Price-per-token alone is insufficient; routing must account for tokens-per-interaction, especially for reasoning/thinking modes that can multiply token usage, aligning with the paper’s budget-aware design.
- Routers can help select cheaper models most of the time and reserve expensive models for harder cases, matching the paper’s exploration–exploitation framing.
- Workload-specific tuning (and even retrieval-augmented routing) is important to realize real-world gains, consistent with the paper’s online learning approach.
- Prompt rewriting plus routing might further enable cheaper model selection under cost constraints, supporting the paper’s practical, budget-aware angle.
Opposed
- In many organizations, LLM API cost isn’t the top priority (other modalities cost more), so sophisticated routing may solve a lower-priority problem.
- Consistency can matter more than marginal performance/cost gains; adaptive routing may introduce variability undesirable for non-trivial applications.
- Human preference priors may be unnecessary; an LLM could self-assess question difficulty—though others dispute that models can reliably anticipate complexity.
- The contribution may not be frontier; router papers abound, arXiv isn’t peer-reviewed, and benchmark results (e.g., RouterBench) can be stale or unrepresentative.
- Reliability and safety concerns (persuasive hallucinations) suggest that optimizing for cost/performance alone is risky without strong correctness guarantees.
- Some argue cheaper, newer base models already outperform expensive legacy models, reducing the need for complex routers.