ʻĀina Foundry Prototypes

Trying AI-Trader on local LLMs

People David Idea Joe wanted to know what the capabilities of https://github.com/HKUDS/AI-Trader look like in the context for local LLMs Details * Looks like the assumption of the repo is that you can test multiple AI in agentic fashion against historical stock market data over a period

4bit Quant Showdown: Finding the Sweet Spot for Qwen3 Models

People: David Idea: To setup our Kumubot cluster, I went down a rabbit hole benchmarking different quantization methods for Qwen3 8B and 32B models to see which ones actually deliver the best accuracy-to-speed tradeoff in real-world use Details: • ExLlamaV3-4bpw surprisingly beat even BF16 on LiveBench accuracy (60.0 vs 58.

Benchmarking multi-lingual open-source LLMs

People: David Idea: Based on conversations with Keao about machine-translation benchmarking, I ran a bunch of LLM models through MMLU-ProX (lite) French benchmark tests (biology section + full 14-topic suite) to see which ones actually deliver the best mix of speed and accuracy in French. This started because we hypothesized that

Local hardware fine-tuning LLMs for Hawaiian-English translation benchmarking

People * David Idea Exploring memory-efficient fine-tuning techniques for improving Hawaiian-to-English translation using Apple's MLX framework, comparing multiple approaches and optimizing for Mac hardware. Details * Successfully fine-tuned gemma-3-4b-it-4bit on Mac M1 Ultra (128GB RAM) achieving 0.8296 semantic similarity score, a 3.6% improvement over the base model * Discovered

Fine-tuning performance between Apple and Nvidia

People: * David Idea: * Comparing fine-tuning performance on MacBook M3 Max, Mac Studio M1 Ultra, and Nvidia 4090 using MLX and Unsloth Details: * Tested fine-tuning Phi-3-mini-4k-instruct model * Followed this Jan 2025 MLX guide for Apple hardware * Used Unsloth library for Nvidia GPU * Dataset had 627 examples and used 500 training steps