Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"…whereas 35A3B is a lot smarter…"

Must. Parse. Is this a 35 billion parameter model that needs only 3 billion parameters to be active? (Trying to keep up with this stuff.)

EDIT: A later comment seems to clarify:

"It's a MoE model and the A3B stands for 3 Billion active parameters…"



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: