Local hardware fine-tuning LLMs for Hawaiian-English translation benchmarking

David Pickett

27 Jun 2025 — 1 min read

People

David

Idea

Exploring memory-efficient fine-tuning techniques for improving Hawaiian-to-English translation using Apple's MLX framework, comparing multiple approaches and optimizing for Mac hardware.

Details

Successfully fine-tuned gemma-3-4b-it-4bit on Mac M1 Ultra (128GB RAM) achieving 0.8296 semantic similarity score, a 3.6% improvement over the base model
Discovered that training memory requirements can be 20-50x the model size, not the commonly cited 2-3x, requiring aggressive optimization techniques
Found that the best performing checkpoint was at iteration 1800 out of 2000, highlighting the importance of saving intermediate checkpoints
Implemented gradient checkpointing + batch_size=1 to reduce peak memory usage from 95+ GB to just 6.1-6.3 GB during training
Compared three fine-tuning experiments: 20 high-quality pairs (overfitting), 2,831 pairs with 200 iterations (undertrained), and 2,831 pairs with 2000 iterations (optimal)
OpenAI's fine-tuning stack plus cloud models still outperforms local approaches, achieving 0.8857 similarity with just 20 training pairs versus 0.8296 with 2,831 pairs locally (but OpenAI's cloud models use a lot more resources and can't be run offline/offgrid)
Optimized EPUB Hawaiian/English passage extraction from hours to 13.85 seconds using rapidfuzz + N-gram indexing + multiprocessing
Established memory guidelines: 100GB RAM (batch_size=1 + grad-checkpoint), 50GB RAM (add sequence_length=1024), <32GB RAM (use smaller 2B models)

Fine-tuning performance between Apple and Nvidia

People: * David Idea: * Comparing fine-tuning performance on MacBook M3 Max, Mac Studio M1 Ultra, and Nvidia 4090 using MLX and Unsloth Details: * Tested fine-tuning Phi-3-mini-4k-instruct model * Followed this Jan 2025 MLX guide for Apple hardware * Used Unsloth library for Nvidia GPU * Dataset had 627 examples and used 500 training steps

Benchmarking Runpod cloud GPUs

People David Idea Exploring cloud GPU performance, cost, and usability for running open-source AI models (in comparison to local hardware and in context of recent learning on software to handle concurrent users) Details * Compared RunPod serverless and pods using Nvidia (vLLM) and AMD (sglang) * Benchmarked Nvidia 3090, 4090, RTX 6000

Comparing Onyx vs Morphik (whole PMF Deep Research)

People: * David Idea: * Comparing two open-source projects, Onyx and Morphik, for self-hosted semantic search (and deep research) across Google Drive files (PDFs, Docs, Sheets, and Slides) Details: * Onyx: Great text-based search, flexible embedding models * Morphik: Powerful multi-modal search (text, images, graphs) * Onyx easily connects and syncs with Google Drive * Morphik

Designing Po‘owai with Magicpatterns.com

A Real-World Test of Prompt-to-UI AI Creating a usable interface for a brand-new app often eats up early project time. To see how far AI can shorten that stage, we pointed Magicpatterns.com at a brand idea called Po‘owai, a cash-flow tool for non-profits that have to juggle restricted

Read more

Fine-tuning performance between Apple and Nvidia

Benchmarking Runpod cloud GPUs

Comparing Onyx vs Morphik (whole PMF Deep Research)

Designing Po‘owai with Magicpatterns.com