(Post-WIP) How to evaluate inference engines
- People
- David
- Idea
- For a given set of local LLM models that a set of users (or community) wants to use - how do you measure performance/UX on ways to serve it (based on ease of use, speed of processing and generation, level of accuracy, and features like multi-user-context etc)
- Details/tools to test
- vllm-benchmark
- llmperf from Ray project
- aider benchmarking
- livebench