Policy-Based LLM Routing with Nvidia's Open Source Blueprint
People
David
Idea
Testing Nvidia's v1 LLM Router blueprint as a third approach to intelligent query routing - this time using policy-based task classification instead of semantic similarity or neural network intent matching.
Details
- Nvidia's v1 LLM Router blueprint (main branch) takes a three-step approach: apply a policy like task or intent classification, use a trained router for that policy, then proxy to the right LLM
- This is our third routing prototype in recent weeks - following experiments with LiteLLM's semantic-router and UIUC's LLMRouter framework
- The Nvidia approach is more structured than the other two - it separates the "what kind of task is this" decision from the "which model handles it" decision
- We got it working to route requests to local models based on task complexity using Nvidia's toolset
- Our longer-term use case is agricultural edge devices in the field deciding where to send LLM queries - locally, on-site, in-island, or to the cloud
- Routing decisions need to account for factors like query complexity, available compute, connectivity, and latency requirements (and in the field, environmental and energy considerations as well)
- We're prototyping on desktop and workstation hardware in the office right now
- Build instructions are being written to also support deployment on lower-powered devices like Raspberry Pi and Nvidia Jetsons (for edge routing)
- Having three different routing approaches prototyped gives us a better picture of the tradeoffs between simplicity, accuracy, and configurability
- Code for this prototype is up at github.com/pickettd/nvidia-llm-router