Basic Semantic Routing with LiteLLM Proxy

David Pickett

23 Jan 2026 — 1 min read

People: David

Idea: Testing semantic routing as a way to automatically send LLM requests to different models based on what the user is asking—part of a bigger vision for smart edge-to-hub-to-cloud routing on our Maui cluster.

Details:

Our LiteLLM proxy already bundles multiple machines into one API endpoint, so adding routing on top felt like a natural next step
Discovered LiteLLM recently integrated the semantic-router library—found this in their docs at docs.litellm.ai/docs/proxy/auto_routing
Their example requires the UI version of the proxy, which we hadn't set up yet
Had to dig through Github commit history to find the file-based config approach
Ran into an encoder configuration issue that needed some config tweaking to resolve
Got it working with three routes: big smart model for programming questions, multimodal model for questions about visuals, and a fast default model for everything else
The routing actually works—asks about code go to the heavy model, questions about visuals goes to the medium size model, and all others go to our fast model - NOTE: still need to implement multimodal encoder setup
This is one piece of a larger puzzle: simple requests on-device, medium at an on-site hub, complex off-site
Not plug-and-play yet, but promising once you get past the initial setup friction
Example code showing fully local option (Ollama-based but works for any LiteLLM provider): https://github.com/pickettd/litellm-local-semantic-router-example

Neural Network Intent Routing with UIUC's LLMRouter

People: David Idea: Tested UIUC's LLMRouter framework as an alternative to LiteLLM's semantic routing—this one trains an actual neural network for intent classification and can run on hardware as small as a Raspberry Pi with 4GB of ram. Details: * Wanted to compare this against last

Automous terminal agent accessible through Slack

People: David Idea: Testing KIRA, Krafton's open-source project that lets you run a full Claude Code instance through Slack, on our local PMF hardware. Details: * Nicole mentioned wanting more Slack bot and automation options for the team * Found Krafton's KIRA project on GitHub which does exactly

CrewAI plus open-source LLMs for marketing strategy

People: Me Idea: Tested whether local/open-source models can handle real marketing strategy work using CrewAI as the orchestration layer. Details: * Wanted to see how open source tools do for automating some tasks in marketing * Used gpt-oss-120b as the backbone model * Ran everything through CrewAI for agent orchestration * Got Joe&

Qwen3-32B on AMD's 7900XTX

People: Me Idea: I wanted to see how well the Qwen3-32B model runs on an AMD 7900XTX using different quantization formats and inference backends—spoiler: AWQ is not the move on this generation of consumer AMD card. Details: * Tested on a host system with Ubuntu 24.04 with ROCm 7.

Read more

Neural Network Intent Routing with UIUC's LLMRouter

Automous terminal agent accessible through Slack

CrewAI plus open-source LLMs for marketing strategy

Qwen3-32B on AMD's 7900XTX