Basic Semantic Routing with LiteLLM Proxy
People: David
Idea: Testing semantic routing as a way to automatically send LLM requests to different models based on what the user is asking—part of a bigger vision for smart edge-to-hub-to-cloud routing on our Maui cluster.
Details:
- Our LiteLLM proxy already bundles multiple machines into one API endpoint, so adding routing on top felt like a natural next step
- Discovered LiteLLM recently integrated the semantic-router library—found this in their docs at docs.litellm.ai/docs/proxy/auto_routing
- Their example requires the UI version of the proxy, which we hadn't set up yet
- Had to dig through Github commit history to find the file-based config approach
- Ran into an encoder configuration issue that needed some config tweaking to resolve
- Got it working with three routes: big smart model for programming questions, multimodal model for questions about visuals, and a fast default model for everything else
- The routing actually works—asks about code go to the heavy model, questions about visuals goes to the medium size model, and all others go to our fast model - NOTE: still need to implement multimodal encoder setup
- This is one piece of a larger puzzle: simple requests on-device, medium at an on-site hub, complex off-site
- Not plug-and-play yet, but promising once you get past the initial setup friction
- Example code showing fully local option (Ollama-based but works for any LiteLLM provider): https://github.com/pickettd/litellm-local-semantic-router-example