Benchmarking Runpod cloud GPUs

People

David

Idea

Exploring cloud GPU performance, cost, and usability for running open-source AI models (in comparison to local hardware and in context of recent learning on software to handle concurrent users)

Details

  • Compared RunPod serverless and pods using Nvidia (vLLM) and AMD (sglang)
  • Benchmarked Nvidia 3090, 4090, RTX 6000 Ada, and AMD MI300X GPUs using Ray project's LLMPerf (8k context size)
  • Tested models: Qwen3-235B-A22B-FP8 and Qwen3-30B-A3B-FP8 (both 40k context)
  • Serverless GPUs had extremely long queues, making them unreliable
  • Dedicated pods had much better reliability
  • 2x Nvidia 4090 pod (each gpu $0.69/hr) setup struggled with throughput
  • RTX 6000 Ada pod ($0.77/hr) showed solid throughput with Qwen3-30B at moderate concurrency
  • Single AMD MI300X GPU pod ($2.49/hr) had outstanding throughput even at high concurrency
  • Dual AMD MI300X handled the larger Qwen3-235B efficiently, though initial model load times were lengthy
  • Model caching/storage for pods can cost around $20/month extra but avoids repetitive downloads

Read more