> It is difficult to say that is not impressive due to it being an emergent ability.
I don't know why you think it's an emergent ability.
It's seeing a sequence of moves, and playing the most likely next move (i.e. the most likely next token) given the previous complete move sequences it was trained on. That's the baseline of what an LLM does—not something emergent. Games in online chess databases tend to be of relatively good players. Nobody wants to look up games played by two 800 ELO players.
As an aside, there have been chess programs for years that show you for a given position all of the previous games in its database with the same position and the win outcome % of each move. That's all that's going on here.
> here have been chess programs for years that show you for a given position all of the previous games in its database with the same position and the win outcome % of each move. That's all that's going on here.
It could be, but would you think that of the 100-300 bn parameters in the model a lot are dedicated to chess move sequences? It seems likely that it has seen such data, but I would be surprised if it is using a considerable chunk to store chess database information.
The web has millions of grandmaster chess games and probably billions of chess games overall. So I wouldn't be surprised if it has like 0.01% to chess games since there are so many, if so that would mean it has 10-30 million parameters to play chess with, for comparison stockfish has 10 million parameters to its chess engine.
Because I don't think that the model learned the literal memorization of chess moves. It must've at least compressed said information in some way way. And since the model is not biased to play chess on its structure nor sampling policy, I think it's fair to consider it an emergent ability.
Chess moves are a tiny/diminute part of all text learned by the model. This memorization argument is very similar to the "Stable Diffusion just takes bits of the images in the original dataset and parches them together".
I don't know why you think it's an emergent ability.
It's seeing a sequence of moves, and playing the most likely next move (i.e. the most likely next token) given the previous complete move sequences it was trained on. That's the baseline of what an LLM does—not something emergent. Games in online chess databases tend to be of relatively good players. Nobody wants to look up games played by two 800 ELO players.
As an aside, there have been chess programs for years that show you for a given position all of the previous games in its database with the same position and the win outcome % of each move. That's all that's going on here.