Benchmarking Runpod cloud GPUs

David Pickett

22 May 2025 — 1 min read

People

David

Idea

Exploring cloud GPU performance, cost, and usability for running open-source AI models (in comparison to local hardware and in context of recent learning on software to handle concurrent users)

Details

Compared RunPod serverless and pods using Nvidia (vLLM) and AMD (sglang)
Benchmarked Nvidia 3090, 4090, RTX 6000 Ada, and AMD MI300X GPUs using Ray project's LLMPerf (8k context size)
Tested models: Qwen3-235B-A22B-FP8 and Qwen3-30B-A3B-FP8 (both 40k context)
Serverless GPUs had extremely long queues, making them unreliable
Dedicated pods had much better reliability
2x Nvidia 4090 pod (each gpu $0.69/hr) setup struggled with throughput
RTX 6000 Ada pod ($0.77/hr) showed solid throughput with Qwen3-30B at moderate concurrency
Single AMD MI300X GPU pod ($2.49/hr) had outstanding throughput even at high concurrency
Dual AMD MI300X handled the larger Qwen3-235B efficiently, though initial model load times were lengthy
Model caching/storage for pods can cost around $20/month extra but avoids repetitive downloads

Benchmarking Runpod cloud GPUs

David Pickett

People

Idea

Details

Read more

Local hardware fine-tuning LLMs for Hawaiian-English translation benchmarking

Fine-tuning performance between Apple and Nvidia

Comparing Onyx vs Morphik (whole PMF Deep Research)

Designing Po‘owai with Magicpatterns.com