Policy-Based LLM Routing with Nvidia's Open Source Blueprint

People

David

Idea

Testing Nvidia's v1 LLM Router blueprint as a third approach to intelligent query routing - this time using policy-based task classification instead of semantic similarity or neural network intent matching.

Details

  • Nvidia's v1 LLM Router blueprint (main branch) takes a three-step approach: apply a policy like task or intent classification, use a trained router for that policy, then proxy to the right LLM
  • This is our third routing prototype in recent weeks - following experiments with LiteLLM's semantic-router and UIUC's LLMRouter framework
  • The Nvidia approach is more structured than the other two - it separates the "what kind of task is this" decision from the "which model handles it" decision
  • We got it working to route requests to local models based on task complexity using Nvidia's toolset
  • Our longer-term use case is agricultural edge devices in the field deciding where to send LLM queries - locally, on-site, in-island, or to the cloud
  • Routing decisions need to account for factors like query complexity, available compute, connectivity, and latency requirements (and in the field, environmental and energy considerations as well)
  • We're prototyping on desktop and workstation hardware in the office right now
  • Build instructions are being written to also support deployment on lower-powered devices like Raspberry Pi and Nvidia Jetsons (for edge routing)
  • Having three different routing approaches prototyped gives us a better picture of the tradeoffs between simplicity, accuracy, and configurability
  • Code for this prototype is up at github.com/pickettd/nvidia-llm-router

Read more