Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Llama.cpp already uses an idea from it internally for the KV cache [0]

So a quantized KV cache now must see less degradation

[0] https://github.com/ggml-org/llama.cpp/pull/21038



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: