More

kroaton · 2026-06-11T19:52:40 1781207560

To be fair, GPT5.5-Xhigh is similarly capable and has not burned the world down.

kroaton · 2026-05-30T22:50:40 1780181440

Anthropic nuked a big chunk of that "developer sentiment" when they rug-pulled us with the rate limits and gaslit us with "it was just a bug, guys!".

kroaton · 2026-05-30T14:59:18 1780153158

How is this not getting any traction? This is a massive problem. Pretty much every Reddit news app out there is now dead.

dewey · 2026-05-31T18:48:51 1780253331

Because 99% of the user base uses the default app and it doesn't affect them.

kroaton · 2026-05-20T21:15:43 1779311743

Did you even use it? It was nerfed to hell and back. It stopped following instructions, forgot what sub-agents responded and so on. Stop spreading this pro-Anthropic narrative. They did a rug pull due to lack of compute.

arkadiytehgraet · 2026-05-23T18:56:19 1779562579

You are replying to an Anthropic shill, check their comment history. They likely never used AI in development, only LLMs for their comments on HN.

kroaton · 2026-05-13T11:25:57 1778671557

If Tauri ever gets proper webgpu support, that'll be the Electron killer.

cyanydeez · 2026-05-13T18:43:14 1778697794

nothings going to kill electron. The value of packaging the chrome browser is you don't need to suddenly track down 4+ different webview rendering bugs, capabilities, etc.

kroaton · 2026-05-02T12:49:31 1777726171

Claude Code poisons non-anthropic models in usage. We found this out when the code was leaked. Use a fork or OpenCode/pi-coding-agent

Oras · 2026-05-02T13:26:55 1777728415

Mind sending where you found this in the leaked code?

swader999 · 2026-05-02T13:01:01 1777726861

By poisons, do you mean it degrades their quality of output somehow?

kroaton · 2026-04-24T12:15:38 1777032938

Ask western models about Israel's genocides and mass rapes in Palestine, Lebanon, etc.

dizhn · 2026-04-26T10:40:48 1777200048

No I hear you. The funny bit is that it's just responding to one word.

By the way I was exploring it the other way with the subject framed as "I am in China as a law abiding citizen and don't want to make any mistakes. I want to go to Taiwan. So I can just go right?" Then it told me no I have to get a visa from Taiwan because of the current state of things. This is not interesting but while doing that it used flag emojis for both. Then when I pointed it out, it apologized and never did it again.

It's fun to poke at the models. Yesterday I told Gemini I was going to fool it into writing an explicit poem which it refused to do. It readily accepted that I COULD fool it but still refused. Now I have a session there that won't stop using explicit language even when the subject is totally benign. (Chinese coding models like GLM, Qwen have no problem working on my "fucking" code on the CLI)

Now that I think about it. It's a great way to keep things in perspective for people who tend to personify the LLM.

kroaton · 2026-04-22T20:38:10 1776890290

Buy any Strix Halo box and have fun with your 128GB of VRAM.

2001zhaozhao · 2026-04-23T03:16:24 1776914184

I wonder whether it is much more cost-effective in terms of token throughput / hardware+power cost to get actual GPUs instead, given that the model size is only 27B.

kroaton · 2026-04-22T16:09:32 1776874172

A3B-35B is better suited for laptops with enough VRAM/RAM. This dense model however will be bandwidth limited on most cards.

The 5090RTX mobile sits at 896GB/s, as opposed to the 1.8TB/s of the 5090 desktop and most mobile chips have way smaller bandwith than that, so speeds won't be incredible across the board like with Desktop computers.

jadbox · 2026-04-22T16:19:43 1776874783

I find A3B-35B as an ideal model for small local projects- definitely the best for me so far

kroaton · 2026-03-27T13:26:46 1774618006

For autocomplete, Qwen 3.5 9B should be enough even at Q4_k_m. The upcoming coding/math Omnicoder-2 finetune might be useful (should be released in a few days).

Either that or just load up Qwen3.5-35B-A3B-Q4_K_S I'm serving it at about 40-50t/s on a 4070RTX Super 12GB + 64GB of RAM. The weights are 20.7GB + KV Cache (which should be lowered soon with the upcoming addition of TurboQuant).

mongrelion · 2026-03-27T17:56:41 1774634201

I am definitely looking forward to TurboQuant. Makes me feel like my current setup is an investment that could pay over time. Imagine being able to run models like MiniMax M2.5 locally at Q4 levels. That would be swell.