Bandit-Based, Budget-Aware LLM Routing with Preference-Informed LinUCB (PILOT)

Added Sep 1, 2025
Article: NeutralCommunity: NeutralMixed

The authors propose adaptive LLM routing under budget constraints by framing it as a contextual bandit problem. They learn a shared query–LLM embedding space from human preferences and refine it online, implementing routing with PILOT, a preference-informed LinUCB variant. An online knapsack-based cost policy enforces budgets, enabling practical, efficient routing without exhaustive labels or multi-model inference.

Key Points

  • Reframes LLM routing as a contextual bandit problem to learn adaptively from bandit feedback rather than relying on exhaustive supervised labels.
  • Builds a shared embedding space aligning queries and LLMs to model their affinity, initialized from offline human preference data and refined online.
  • Introduces PILOT, a preference-informed extension of LinUCB, to operationalize adaptive routing decisions.
  • Adds a budget-aware online cost policy modeled as a multi-choice knapsack problem to respect diverse user and system cost constraints.
  • Aims to reduce unnecessary multi-model inference while adapting to evolving query distributions in practical deployments.

Sentiment

The community is moderately skeptical. While acknowledging cost optimization is a real concern, many commenters feel the problem is over-engineered — simpler approaches or practical production experience undermine the paper's value proposition. The discussion frequently veers into broader debates about AI progress and AGI, suggesting the specific paper is less interesting to HN than the meta-questions it raises.

In Agreement

  • The massive cost differences between models make routing economically compelling even with imperfect accuracy
  • The contextual bandit framing is a reasonable improvement over supervised learning approaches that require exhaustive labels
  • Budget-aware model selection addresses a practical engineering need for teams managing multiple LLM providers
  • Prompt rewriting combined with cost-based routing could yield additional optimization opportunities

Opposed

  • Routing may be unnecessary when cheap models like Gemini Flash already achieve competitive performance — just pick one good cheap model
  • Price-per-token is an insufficient proxy for cost since thinking models can consume orders of magnitude more tokens per interaction
  • Routing between heterogeneous models introduces behavioral inconsistency that may matter more than cost savings for real applications
  • Human preference data should not be needed when LLMs themselves can self-assess query complexity
  • Routers test well on benchmarks but significantly underperform in production without workload-specific tuning
  • The benchmarks used (RouterBench from March 2024) may be outdated given the pace of model releases
  • This type of optimization research suggests LLM performance has plateaued, making the contribution less impactful
Bandit-Based, Budget-Aware LLM Routing with Preference-Informed LinUCB (PILOT) | TD Stuff