Hacker Newsnew | past | comments | ask | show | jobs | submit | kroaton's commentslogin

To be fair, GPT5.5-Xhigh is similarly capable and has not burned the world down.

Anthropic nuked a big chunk of that "developer sentiment" when they rug-pulled us with the rate limits and gaslit us with "it was just a bug, guys!".


How is this not getting any traction? This is a massive problem. Pretty much every Reddit news app out there is now dead.


Because 99% of the user base uses the default app and it doesn't affect them.

Did you even use it? It was nerfed to hell and back. It stopped following instructions, forgot what sub-agents responded and so on. Stop spreading this pro-Anthropic narrative. They did a rug pull due to lack of compute.


You are replying to an Anthropic shill, check their comment history. They likely never used AI in development, only LLMs for their comments on HN.


If Tauri ever gets proper webgpu support, that'll be the Electron killer.


nothings going to kill electron. The value of packaging the chrome browser is you don't need to suddenly track down 4+ different webview rendering bugs, capabilities, etc.


Claude Code poisons non-anthropic models in usage. We found this out when the code was leaked. Use a fork or OpenCode/pi-coding-agent


Mind sending where you found this in the leaked code?


By poisons, do you mean it degrades their quality of output somehow?


Ask western models about Israel's genocides and mass rapes in Palestine, Lebanon, etc.


No I hear you. The funny bit is that it's just responding to one word.

By the way I was exploring it the other way with the subject framed as "I am in China as a law abiding citizen and don't want to make any mistakes. I want to go to Taiwan. So I can just go right?" Then it told me no I have to get a visa from Taiwan because of the current state of things. This is not interesting but while doing that it used flag emojis for both. Then when I pointed it out, it apologized and never did it again.

It's fun to poke at the models. Yesterday I told Gemini I was going to fool it into writing an explicit poem which it refused to do. It readily accepted that I COULD fool it but still refused. Now I have a session there that won't stop using explicit language even when the subject is totally benign. (Chinese coding models like GLM, Qwen have no problem working on my "fucking" code on the CLI)

Now that I think about it. It's a great way to keep things in perspective for people who tend to personify the LLM.


Buy any Strix Halo box and have fun with your 128GB of VRAM.


I wonder whether it is much more cost-effective in terms of token throughput / hardware+power cost to get actual GPUs instead, given that the model size is only 27B.


A3B-35B is better suited for laptops with enough VRAM/RAM. This dense model however will be bandwidth limited on most cards.

The 5090RTX mobile sits at 896GB/s, as opposed to the 1.8TB/s of the 5090 desktop and most mobile chips have way smaller bandwith than that, so speeds won't be incredible across the board like with Desktop computers.


I find A3B-35B as an ideal model for small local projects- definitely the best for me so far


For autocomplete, Qwen 3.5 9B should be enough even at Q4_k_m. The upcoming coding/math Omnicoder-2 finetune might be useful (should be released in a few days).

Either that or just load up Qwen3.5-35B-A3B-Q4_K_S I'm serving it at about 40-50t/s on a 4070RTX Super 12GB + 64GB of RAM. The weights are 20.7GB + KV Cache (which should be lowered soon with the upcoming addition of TurboQuant).


I am definitely looking forward to TurboQuant. Makes me feel like my current setup is an investment that could pay over time. Imagine being able to run models like MiniMax M2.5 locally at Q4 levels. That would be swell.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: