(Post-WIP) How to evaluate inference engines

  • People
    • David
  • Idea
    • For a given set of local LLM models that a set of users (or community) wants to use - how do you measure performance/UX on ways to serve it (based on ease of use, speed of processing and generation, level of accuracy, and features like multi-user-context etc)
  • Details/tools to test
    • vllm-benchmark
    • llmperf from Ray project
    • aider benchmarking
    • livebench

Read more