Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> These people used bad prompts and came to the conclusion that ChatGPT can’t play a legal chess game. (…)

> With this prompt ChatGPT almost always plays fully legal games.

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table (…)

> (…) with GPT4 (…) in the two games I attempted, it made numerous illegal moves.

So you’ve ostensibly¹ found a way to reduce the error rate and then deliberately ignored the cases where it failed. In short: it may play valid chess under certain conditions but can’t be trusted to do so. That doesn’t contradict previous findings.

¹ 19 games is a small sample and the supposedly more advanced system failed in your tries.



Fuller context from the article:

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” So whenever it wanted to make an illegal move, it resigned.

(my emphasis)

So the illegal moves are at least part of the reasons for the 6 losses, and factored into the rating. Quickly scanning the game, it seems 3 of the losses ended in checkmate, so that leaves 3 illegal moves in 19 games.

Could be better, but for a system not intentionally built to play chess, it's pretty decent.


No ELO 1400 player will have that rate of illegal moves, so saying it that it plays with an ELO 1400 rating is disingenuous.

Reinterpreting illegal moves as resignation is absurd when an LLM is formally capable of expressing statements "I resign" or "I cannot conceive of a winning move from here" just as well as any human player. It just doesn't do so because it's not actually playing chess the way we think of an ELO 1400 player playing chess.


Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

My point is, it sounds like Elo doesn't measure what we want it to measure. If we care about the way an agent wins a game and not just whether it wins a game, then we need an instrument that measures strategy, not outcome.


> Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

Then it's still isn't anywhere near ELO 1400.


Under FIDE rules it's first a forfeit after the second illegal move, so if anything it would seem that the interpretation used by the article author underestimates its ELO ranking.


Nope, still not even close to what the author claims. If I understand it correctly, it made illegal moves in 3 out of 19 games. That's probably a few orders of magnitude more illegal moves than even a 1400 ELO player would make of their entire lifetime.


Repeating what others have said in this thread:

The author claims: chatGPT has a 1400 chess ELO based on games played.

You appear to think author claims: chatGPT plays chess like a human rated 1400.

Your observations do not contradict the authors’ claim that based on games won and lost against opponents of a specific strength, the estimated ELO is 1400.

A non-human player can make illegal moves at a much higher rate and make up for that by being stronger when it does not make illegal moves to achieve the same rating as a human player who plays the game in a completely different way.


There's the "it" which has no post-processing, and there's the "it" where the output is post-processed to announce a resignation when it attempts an illegal move.

Some things about the two "it"s:

- They differ trivially.

- They enable new capabilities, such as the ability to explain why a move got made. Current chess AIs are not good at this.

So I think you're making too much of a big deal from a comparative triviality.

[edit]

We might be talking past each other. And some people above have come to doubt the article's results even with the right prompt engineering.


The ranking takes into account wins and losses, not illegal moves, and so the fact that it plays in a way where a higher proportion of its losses is down to illegal moves than a human player is not relevant to its ranking. It may suggest that the ranking ought to take that into account, but that's a separate issue.


That no human ELO 1400 player will have that rate of illegal moves may be true, but if anything treating the very first illegal move as forfeit appears to be stricter than most rules


Does that matter? Seems weird to me to make that argument. I’m honestly quite confused by it.

A bowling bot that threw strikes 9 out of 10 throws and a gutter ball one time out of ten would still be a great bowler even though no human with the ability to make strikes that often would pretty much ever throw a gutter ball.

This is a weird kind of alien intelligence that does not have to behave like humans.


Note that the claim is not that it's an ELO 1400 human equivalent player but that it can play chess at a level that gives it an ELO of 1400, which is not nitpicking: that's a completely different thing. We're not testing whether it plays like a player with ELO x, we're proving that "it can't play chess" is fallacious. It can, and when prompted properly, it can achieve an ELO of 1400.

ELO allows for illegal moves: as per the rules of chess, you lose the game if you make an illegal move. The end, ELO doesn't care about why you lost a game on purpose.


I personally find that makes it more astonishing, that it would slip up on knowing the most basic elements of the game, yet still be able to play better than most humans. Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them. But that usually doesn't stop smart people from having an impact in making a contribution with their insights. The question of illegal moves is superficial, since most online systems have guardrails in place that prevent them. At worst it's just an embarrassment and I don't think machines care about being embarrassed.


> Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them

This is the opposite of that, a highly trained but dumb entity that has seen many lifetimes worth of games but is still tripping up on basics. But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.


> But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.

But it is a master, as has been pointed out repeatedly. If you replace all illegal moves with resignations, and use the same style of prompt as the OP did, then it plays like an expert. I'm objecting because you're making it sound like it's a trivial result.


> you're making this sound like it's a trivial result

I don't think this is a trivial result, emulating a highly trained idiot is still very impressive. But it is very different from an untrained genius.


You seem to have very rigid and boring definitions of the words "idiot" and "genius". The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

Tbh, I don't even know what you're saying.

[edit] OK, I might have misunderstood you. It's not always clear what people mean.


> The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

That isn't relevant to my comment, an idiot human is still a human. Your comment here therefore doesn't make sense. The comment I responded to likened it to a genius entering a new field, I objected to that, that is all.


ELO is based off who you win and lose against. The rate of illegal moves has nothing to do with ELO.


I'd be interested if it could be coaxed into legal moves after making an illegal one. "That is an illegal move. Can you do something legal with this board?"


So it sounds like it can play _some_ legal chess games, but not all; it's unable to consistently complete a game where it loses. Maybe the remaining work shouldn't be focused on trying to teach it chess rules better, but to teach it sportsmanship better. People were so excited about teaching it high-school level academics that we forgot to teach it the basic lessons we learn in kindergarten.


It seems like it plays mostly legal chess games, when not explicitly reminded of the rules. There's no problem of sportsmanship when it makes mistakes in a game it has not been verified to understand the rules of.


I was responding to the conclusion from TFA quoted by the parent comment, that playing an illegal move was it saying "this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” If you reject that premise, then yes, my response to it will not be particularly relevant to your worldview.


Playing illegal moves is accounted for in rules. Depending on which rules you play by it can be an immediate forfeit, or involves redoing moves and adding time for the opponent, possibly with forfeit if repeated. As such, the article opted for one of the strictest possible rule sets. You can reject the interpretation he gave, and the outcome under those rules would still be the same. If you were to pick a more lenient ruleset, it's possible it would've come out with an even higher ranking.


Or append "If you wish to resign or you cannot think of a legal move, type 'resign'" to the end of the prompt.


That's basically my point; that sort of context is exactly the sort of thing you would not need to say to a person who grew up in a typical social environment. If we focus too much on teaching AI technical skills, we might later find out that some of the social skills we think of as implicit were just as important.


The article also says in one game chatgpt when crazy so they continued the game with a fresh chat. That probably should have been counted a resignation loss too.


> So whenever it wanted to make an illegal move, it resigned.

Making an illegal move counts as losing by the laws of chess, so this is essentially correct


Obviously the article should be taken with a giant grain of salt. That being said, not many things what aren't designed to play chess can play chess, with or without coaxing. My dog cannot, for instance, nor can my coffee table.


> My dog cannot, for instance, nor can my coffee table.

You must be giving them the wrong prompts.


[redacted]


It's a joke


The illegal moves were counted as losses/resignations, not ignored.


> So whenever it wanted to make an illegal move, it resigned.

Doesn't sound like ignoring the cases where it failed


I’m going to float something ridiculous:

An illegal move is a valid play. You might not get caught. I think there are some Magnus games where illegal moves went overlooked and impacted the game.

You could interpret this as “ChatGPT wants to cheat sometimes.” But I personally interpret it as “ChatGPT doesn’t understand what it’s doing. It’s just a really really good simulacrum.”


Is this the top comment (and not even grey) because more people failed to read the article than read it?


A baffling thread.

They quoted the article, so clearly they read it... but not very well?


It does seem that way.


I'm confused. If you read the article, you know that you are wrong - but you are quoting the article?


That's how one uses any tool.


The behavior of pretty much every other tool is much easier to interpret though.


If the title of the article was:

> A trivial wrapper around ChatGPT has a Chess Elo of 1400

would you have any issue?

Afaict, the thesis is the article is not "ChatGPT is the ideal tool for playing AI chess", but "it is interesting how well ChatGPT can play chess with some very simple tweaks."


Yes, but it also completely invalidates the measurement of a 1400 elo rating. By comparison, any player making an illegal move is forfeiting the game, almost all people from ~300 elo can play without making illegal moves, chatgpt cant.


> almost all people from ~300 elo can play without making illegal moves

I don't believe you. Are you giving those people a restricted move set (i.e. computer chess, where it will _only_ allow legal moves)? Because if you give people an unrestricted board, I _guarantee_ you people will make lots of illegal moves.

Me: Moves pawn

Opponent: You can't do that, you exposed your king to check.

Me: Oops, sorry, you're right.


Why do illegal moves forfeit? In online play, they're validated. You can't make illegal moves. What's the ELO score if ChatGPT is corrected, and chooses a new move?


All this above, and people are claiming that ChatGPT lacks human level comprehension of the text it consumes.

In Chess.com, you absolutely can attempt an illegal move, and many players do, and you will not get punished for it, so chatgpt is better then a 1400 human player.


ChatGPT did forfeit whenever it made an illegal move, read the article.


No, the writer arbitrarily decided to interpret illegal moves as resignations in order to support the conclusion they wanted. That's very different and grossly unscientific.


I mean, that's more lenient than the official "interpretation" (rule) which is that your second illegal move results in a forfeit.


This is not a scientific paper, and I at least find this decision justified, as he could have been more lenient and grab headlines with a bigger ELO.


The article:

> So whenever it wanted to make an illegal move, it resigned.

You:

> By comparison, any player making an illegal move is forfeiting the game...

By comparison indeed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: