I got qwen3.6:27B running on my 4090 (24GB) with ~128K context leveraging some o...

altruios · 2026-05-11T17:18:52 1778519932

What is your exp on performance +40k tokens? I've not gone past that as I've heard reports that were problems start to arise. I'd be happy to know your experience in that regard.

rapatel0 · 2026-05-17T11:57:29 1779019049

I'm super happy with the performance, I generally run with 2 parallel slots so I only get about 128K context window. My experience with all llms is that they get more forgetful if you use the full window. (256-512K is the sweet spot for frontier models, 128k works for me with this current qwen)

dmichulke · 2026-05-11T12:35:34 1778502934

Forgive my ignorance but aren't they already on huggingface?

I assumed turboquant optimizations are already everywhere - in llama-cpp, or the quantization machinery of unsloth and the likes.

rapatel0 · 2026-05-17T11:55:09 1779018909

I forked it to also add rotorquant. This is a specific optimization that uses clifford rotors instead of static compile time random purmutation to store the activations. Reduces space and parameter count for the storage.