I've been using unsloth/gemma-4-31B-it-qat-GGUF daily for various small parsing ...

amdivia · 2026-06-16T20:11:10 1781640670

110+ Tok/s as another data point on the RTX 5090 (Gemma 4 31B QAT + MTP at UD-Q4_K_XL) (at peak used 27 GB of vram)

The real lovely thing was getting 300+ Tok/s (Gemma 4 26B QAT + MTP at UD-Q4_K_XL) (at peak, I think I saw vram usage reach 21 GB of vram)

lmedinas · 2026-06-16T21:07:52 1781644072

the problem of that setup is that it will run out of context pretty quick. So for coding agent it will limit your workflow very fast.