I am running qwen 3.6 9b quantized model on my m4 pro 48gb and it is barely usef...

Casteil · 2026-05-11T16:54:00 1778518440

Why not 35b-a3b? ...or gemma4:26b-a4b? Both will be more capable than 9b and run at roughly similar (perhaps faster) speeds

carbocation · 2026-05-11T00:13:21 1778458401

Was the choice of such a small model driven by a desire for high tok/sec? I ask because an m4 pro 48gb machine can run larger models (if model intelligence is the thing that would make it more useful).

sourc3 · 2026-05-11T00:23:50 1778459030

Yes that was my goal. Also noticed a huge performance gain going from ollama to mlx. Your mileage may vary.

elij · 2026-05-11T00:20:46 1778458846

I'm using the 30b MOE model on same spec with 65k tokens as a sub agent with tooling and it absolutely writes decent code. The dense 9b I agree wasn't great.

hparadiz · 2026-05-11T00:09:16 1778458156

How does it (the openrouter version) compare to ChatGPT 5.5 or Claude Opus 4.6?

sourc3 · 2026-05-11T00:26:12 1778459172

Good enough. It gets 60-70% of the work I need done for a lot less $ (keep in mind I am using these for personal projects that doesn’t generate revenue). If I was using it with the hopes of making money I think I would just use Codex at this point.

sjones671 · 2026-05-11T00:29:09 1778459349

Thanks for saying this. There's so much nonsense out there online about local models being better than Opus 4.7 and the like. It's just not true for regular users.

I have a brand new M5 MacBook Pro - top end with all the specs and I've tried local models and they're barely functional.

Yukonv · 2026-05-11T00:55:49 1778460949

What models and quantizations have you been trying? I've had great success with the larger Qwen 3.x models at 6-bit levels. Using 6 bit quantization is really the bare minimum to give local models a fair shot at agentic flows. Once you start pushing below that the models become more "dumb" from the limited bit space.

SecretDreams · 2026-05-11T01:31:09 1778463069

The main benefits for local are:

1) control 2) privacy 3) transparent cost model

Cloud has tremendous value for speed, plug and play, and performance. You need to decide how those compete with the benefits of local - both today, and a year from now, e.g.