Basic Semantic Routing with LiteLLM Proxy

People: David

Idea: Testing semantic routing as a way to automatically send LLM requests to different models based on what the user is asking—part of a bigger vision for smart edge-to-hub-to-cloud routing on our Maui cluster.

Details:

  • Our LiteLLM proxy already bundles multiple machines into one API endpoint, so adding routing on top felt like a natural next step
  • Discovered LiteLLM recently integrated the semantic-router library—found this in their docs at docs.litellm.ai/docs/proxy/auto_routing
  • Their example requires the UI version of the proxy, which we hadn't set up yet
  • Had to dig through Github commit history to find the file-based config approach
  • Ran into an encoder configuration issue that needed some config tweaking to resolve
  • Got it working with three routes: big smart model for programming questions, multimodal model for questions about visuals, and a fast default model for everything else
  • The routing actually works—asks about code go to the heavy model, questions about visuals goes to the medium size model, and all others go to our fast model - NOTE: still need to implement multimodal encoder setup
  • This is one piece of a larger puzzle: simple requests on-device, medium at an on-site hub, complex off-site
  • Not plug-and-play yet, but promising once you get past the initial setup friction
  • Example code showing fully local option (Ollama-based but works for any LiteLLM provider): https://github.com/pickettd/litellm-local-semantic-router-example

Read more