I am running qwen 3.6 9b quantized model on my m4 pro 48gb and it is barely useful to do some basic pi.dev/cc driven development. I think 128gb desktops are the sweet setup to actually get meaningful work done. However, getting your hands on one of these machines is difficult at the moment.
As much fun as it is to run these things locally don’t forget that your time is not free. I am slowly migrating my use cases to openrouter and run the largest qwen model for < $2-3/day with serious use for personal projects.
Was the choice of such a small model driven by a desire for high tok/sec? I ask because an m4 pro 48gb machine can run larger models (if model intelligence is the thing that would make it more useful).
I'm using the 30b MOE model on same spec with 65k tokens as a sub agent with tooling and it absolutely writes decent code. The dense 9b I agree wasn't great.
Good enough. It gets 60-70% of the work I need done for a lot less $ (keep in mind I am using these for personal projects that doesn’t generate revenue). If I was using it with the hopes of making money I think I would just use Codex at this point.
Thanks for saying this. There's so much nonsense out there online about local models being better than Opus 4.7 and the like. It's just not true for regular users.
I have a brand new M5 MacBook Pro - top end with all the specs and I've tried local models and they're barely functional.
What models and quantizations have you been trying? I've had great success with the larger Qwen 3.x models at 6-bit levels. Using 6 bit quantization is really the bare minimum to give local models a fair shot at agentic flows. Once you start pushing below that the models become more "dumb" from the limited bit space.
Cloud has tremendous value for speed, plug and play, and performance. You need to decide how those compete with the benefits of local - both today, and a year from now, e.g.
As much fun as it is to run these things locally don’t forget that your time is not free. I am slowly migrating my use cases to openrouter and run the largest qwen model for < $2-3/day with serious use for personal projects.