Local hardware fine-tuning LLMs for Hawaiian-English translation benchmarking

People

  • David

Idea

Exploring memory-efficient fine-tuning techniques for improving Hawaiian-to-English translation using Apple's MLX framework, comparing multiple approaches and optimizing for Mac hardware.

Details

  • Successfully fine-tuned gemma-3-4b-it-4bit on Mac M1 Ultra (128GB RAM) achieving 0.8296 semantic similarity score, a 3.6% improvement over the base model
  • Discovered that training memory requirements can be 20-50x the model size, not the commonly cited 2-3x, requiring aggressive optimization techniques
  • Found that the best performing checkpoint was at iteration 1800 out of 2000, highlighting the importance of saving intermediate checkpoints
  • Implemented gradient checkpointing + batch_size=1 to reduce peak memory usage from 95+ GB to just 6.1-6.3 GB during training
  • Compared three fine-tuning experiments: 20 high-quality pairs (overfitting), 2,831 pairs with 200 iterations (undertrained), and 2,831 pairs with 2000 iterations (optimal)
  • OpenAI's fine-tuning stack plus cloud models still outperforms local approaches, achieving 0.8857 similarity with just 20 training pairs versus 0.8296 with 2,831 pairs locally (but OpenAI's cloud models use a lot more resources and can't be run offline/offgrid)
  • Optimized EPUB Hawaiian/English passage extraction from hours to 13.85 seconds using rapidfuzz + N-gram indexing + multiprocessing
  • Established memory guidelines: 100GB RAM (batch_size=1 + grad-checkpoint), 50GB RAM (add sequence_length=1024), <32GB RAM (use smaller 2B models)

Read more