More

RandyOrion · 2026-06-15T10:57:12 1781521032

Please do not claim you trained a new model, only to got caught red-handed by others. There are already several people or groups did that, got caught, and vanished in no time.

Check how the "authors" of "this model" react to this problem [1]. See how they deal with this problem by first changing their affiliation from https://iplanrio.rio.rj.gov.br to https://iplanrio.prefeitura.rio [2], then saying that they are sorry for being caught [3], then just remove all their affiliations once for all [4].

I think the "authors" of "this model" [5] should be held accountable until they upload new checkpoints, and the performance of the new model is verified by third-parties.

P.S. To people who downvoted me, show me why you're doing this.

[1] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[2] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[3] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[4] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[5] https://huggingface.co/prefeitura-rio

RandyOrion · 2026-06-13T15:00:22 1781362822

I don't know how open source AI wins. The description is too vague for serious discussions. What I do know is that, once closed source AI groups become anti-you, you should punish them, or help open source groups, or both.

If you really want specific open source {LLM, LMM, research, harness, whatever} groups to win over closed source counterparts, you may show your care by trying open source solutions first when solving problems. And if they're really capable, award them with contributions or something.

RandyOrion · 2026-06-12T03:50:18 1781236218

Hilarious read. I laugh out loud multiple times during the read. In the end, I think amd should pay the author simply for the will of debugging for a broken software written by amd, as well as the sheer amounts of loose ends this exploration leads to.

RandyOrion · 2026-06-11T13:26:49 1781184409

Thanks gemma team for this release.

Compared to autoregressive decoding, diffusion is huge for local MoE inference because of the improved token generation efficiency, especially for normal GPU + ram offload setting.

However, there are models which are better positioned on the performance vs memory pareto front, i.e. dense models, so I'll just wait.

P.S. QAT is really something as it reduces the performance fluctuations compared to the normal one. Thanks again.

RandyOrion · 2026-06-06T05:41:06 1780724466

From the perspective of a local llm user, I think the qat doesn't solve the major problem of the gemma models.

Gemma family (gen 1 to gen 4) is consistent with extreme range of activations, i.e., 600000, essentially forcing people to use bf16 kv cache and accept a short context window, e.g., 31b, iq4_xs quantization, 100k context window on 32gb memory. Or, people use q8 kv cache, 200k context window, and accept a large performance penalty.

In contrast, for qwen 3.5 family, the largest activation is below 2000, making q8 or even lower-precision kv cache essentially free estates. Together with linear attention, which doesn't require kv cache, full 262k context window can be easily reached.

Qat training with w4a16 target, while improving performance on inference with low-precision weighs, doesn't solve kv cache problem at all.

In the end, a qat is a qat, and there are unseen efforts behind qat checkpoints. Thank you gemma team for releasing qat checkpoints.

RandyOrion · 2026-06-06T05:42:34 1780724554

More rants about local inference, consider yourself warned.

Together with bf16 related deliberate hardward degrades on consumer-level nvidia gpus, i.e., gtx 10, rtx 20, 30, 40, 50 series, things gets sour really quickly.

RandyOrion · 2026-06-03T18:05:23 1780509923

A small dense multimodal model with audio support, interesting.

Wait, *Excluding Chinese language.

This is ... curious.

P.S. Where is gemma 4 124b?

kylehotchkiss · 2026-06-03T18:22:11 1780510931

Where are the computers we could purchase to run 124b models :’(

thot_experiment · 2026-06-03T19:52:03 1780516323

You can get SXM V100s for like $100 off ebay, if you're willing to do the troubleshooting work to get em running with adapters you can build a computer capable of fitting a Q4 quant of a 120b model in VRAM for something like fifteen hundred dollars. (assuming you already have some RAM sticks laying around T___T)

RandyOrion · 2026-05-08T09:17:13 1778231833

This website brings me some good chuckles. Now I really know how powerful an on-demand bullsh*t generator is.

RandyOrion · 2026-05-05T13:41:56 1777988516

Like the recent copilot silent signing incident, the without consent part is blatant foul move.

If you don't like be treated like anything but human, you should seriously consider replacing chrome with ungoogled chromium or other browsers.

RandyOrion · 2026-05-03T03:11:30 1777777890

Yeah, this is part of the reason why vscodium exists.

RandyOrion · 2026-05-03T01:54:39 1777773279

Wow. Just like using ungoogled-chromium instead of chrome, lineage os instead of oem android, using vscodium instead of vscode is again justified. These decisions really are the ones that I'll never regret.

In addition, using the word microslop instead of microsoft is again justified, too.