Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

(github.com)

98 points | by MediaSquirrel 3 hours ago

6 comments

conception 10 minutes ago
I’m pretty excited about the edge gallery ios app with gemma 4 on it but it seems like they hobbled it, not giving access to intents and you have to write custom plugins for web search, etc. Does anyone have a favorite way to run these usefully? ChatMCP works pretty well but only supports models via api.
LuxBennu 2 hours ago
I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.
[-]
- MediaSquirrel 2 hours ago
  Memory usage increases quadratically with sequence length. Therefore, using shorter sequences during fine-tuning can prevent memory explosions. On my 64GB RAM machine, I'm limited to input sequences of about 2,000 tokens, considering my average output for the fine-tuning task is around 1,000 tokens (~3k tokens total).
  [-]
  - LuxBennu 41 minutes ago
    Ah that makes sense, quadratic scaling is brutal. So with 96gb i'd probably get somewhere around 4-5k total sequence length before hitting the wall, which is still pretty limiting for anything multimodal. Do you do any gradient checkpointing or is that not worth the speed tradeoff at these sizes?
craze3 3 hours ago
Nice! I've been wanting to try local audio fine-tuning. Hopefully it works with music vocals too
yousifa 2 hours ago
This is super cool, will definitely try it out! Nice work
dsabanin 3 hours ago
Thanks for doing this. Looks interesting, I'm going to check it out soon.
[-]
- MediaSquirrel 2 hours ago
  you are welcome! It was a fun side quest
pivoshenko 2 hours ago
nice!