> It is difficult to say that is not impressive due to it being an emergent abil...

oezi · on March 17, 2023

> here have been chess programs for years that show you for a given position all of the previous games in its database with the same position and the win outcome % of each move. That's all that's going on here.

It could be, but would you think that of the 100-300 bn parameters in the model a lot are dedicated to chess move sequences? It seems likely that it has seen such data, but I would be surprised if it is using a considerable chunk to store chess database information.

Jensson · on March 17, 2023

The web has millions of grandmaster chess games and probably billions of chess games overall. So I wouldn't be surprised if it has like 0.01% to chess games since there are so many, if so that would mean it has 10-30 million parameters to play chess with, for comparison stockfish has 10 million parameters to its chess engine.

mrbungie · on March 17, 2023

Because I don't think that the model learned the literal memorization of chess moves. It must've at least compressed said information in some way way. And since the model is not biased to play chess on its structure nor sampling policy, I think it's fair to consider it an emergent ability.

Chess moves are a tiny/diminute part of all text learned by the model. This memorization argument is very similar to the "Stable Diffusion just takes bits of the images in the original dataset and parches them together".