Testing Qwen3-Omni Audio Inputs

David Pickett

04 Nov 2025 — 1 min read

People

David

Idea

Part of our work in the Kumubot cluster involves being able to work on both text as well as audio recordings - the idea was the figure out the capabilities of Qwen3-Omni on our Blackwell hardware

Details

Blackwell software support (at least for the RTX 6000 Pro) is still pretty early
Qwen describes compatibility with VLLM for high-throughput inference - though it was only in the latest releases that there is support for workstation Blackwell cards
Using nightly docker images (tested with the latest from Nov 2nd 2025) now VLLM is using Pytorch 2.9 and Cuda 12.9 so I was able to get Qwen3-Omni working with both audio and visual inputs
Our workstation card is a pretty good fit for the current optimizations on Qwen3-Omni-30b-AWQ-4bit - including the text, audio, and video caching our GPU is expected to be able to process 10 concurrent requests
Example of the type of request that can be sent to the model and example response (image and audio):
- Input:

LLM_IMAGE_URL="https://upload.wikimedia.org/wikipedia/commons/thumb/3/35/Cow-on_pole%2C_with_antlers.jpeg/960px-Cow-on_pole%2C_with_antlers.jpeg"
LLM_AUDIO_URL="https://upload.wikimedia.org/wikipedia/commons/7/72/Whiskers%27_purr_edit.ogg"
"messages": [
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "$LLM_IMAGE_URL"}},
        {"type": "audio_url", "audio_url": {"url": "$LLM_AUDIO_URL"}},
        {"type": "text", "text": "What is the image showing and separately what it the sound in the audio? Answer each question with just one sentence"}
    ]}
  ]

- Output example:

"content": "The image shows a large, black and white statue of a cow with deer antlers perched on top of a utility pole. The audio contains the distinct sound of a cat purring."

Policy-Based LLM Routing with Nvidia's Open Source Blueprint

People David Idea Testing Nvidia's v1 LLM Router blueprint as a third approach to intelligent query routing - this time using policy-based task classification instead of semantic similarity or neural network intent matching. Details * Nvidia's v1 LLM Router blueprint (main branch) takes a three-step approach: apply

Neural Network Intent Routing with UIUC's LLMRouter

People: David Idea: Tested UIUC's LLMRouter framework as an alternative to LiteLLM's semantic routing - this one trains an actual neural network for intent classification and can run on hardware as small as a Raspberry Pi with 4GB of ram. Details: * Wanted to compare this against

Basic Semantic Routing with LiteLLM Proxy

People: David Idea: Testing semantic routing as a way to automatically send LLM requests to different models based on what the user is asking—part of a bigger vision for smart edge-to-hub-to-cloud routing on our Maui cluster. Details: * Our LiteLLM proxy already bundles multiple machines into one API endpoint, so

Automous terminal agent accessible through Slack

People: David Idea: Testing KIRA, Krafton's open-source project that lets you run a full Claude Code instance through Slack, on our local PMF hardware. Details: * Nicole mentioned wanting more Slack bot and automation options for the team * Found Krafton's KIRA project on GitHub which does exactly