(Post-WIP) Comparing LLM inference engines (multi-user and multi-model)
* People * David * Idea * For a given set of local LLM models that a set of users (or community) wants to use - what is the best/easiest way to serve it (based on ease of use, speed of processing and generation, and features like multi-context and * Details * Engines like llama.