> These people used bad prompts and came to the conclusion that ChatGPT can’t pl...

vidarh · on March 17, 2023

Fuller context from the article:

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” So whenever it wanted to make an illegal move, it resigned.

(my emphasis)

So the illegal moves are at least part of the reasons for the 6 losses, and factored into the rating. Quickly scanning the game, it seems 3 of the losses ended in checkmate, so that leaves 3 illegal moves in 19 games.

Could be better, but for a system not intentionally built to play chess, it's pretty decent.

swatcoder · on March 17, 2023

No ELO 1400 player will have that rate of illegal moves, so saying it that it plays with an ELO 1400 rating is disingenuous.

Reinterpreting illegal moves as resignation is absurd when an LLM is formally capable of expressing statements "I resign" or "I cannot conceive of a winning move from here" just as well as any human player. It just doesn't do so because it's not actually playing chess the way we think of an ELO 1400 player playing chess.

JellyBeanThief · on March 17, 2023

Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

My point is, it sounds like Elo doesn't measure what we want it to measure. If we care about the way an agent wins a game and not just whether it wins a game, then we need an instrument that measures strategy, not outcome.

illiarian · on March 17, 2023

> Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

Then it's still isn't anywhere near ELO 1400.

vidarh · on March 17, 2023

Under FIDE rules it's first a forfeit after the second illegal move, so if anything it would seem that the interpretation used by the article author underestimates its ELO ranking.

illiarian · on March 17, 2023

Nope, still not even close to what the author claims. If I understand it correctly, it made illegal moves in 3 out of 19 games. That's probably a few orders of magnitude more illegal moves than even a 1400 ELO player would make of their entire lifetime.

pedrosorio · on March 17, 2023

Repeating what others have said in this thread:

The author claims: chatGPT has a 1400 chess ELO based on games played.

You appear to think author claims: chatGPT plays chess like a human rated 1400.

Your observations do not contradict the authors’ claim that based on games won and lost against opponents of a specific strength, the estimated ELO is 1400.

A non-human player can make illegal moves at a much higher rate and make up for that by being stronger when it does not make illegal moves to achieve the same rating as a human player who plays the game in a completely different way.

ogogmad · on March 17, 2023

There's the "it" which has no post-processing, and there's the "it" where the output is post-processed to announce a resignation when it attempts an illegal move.

Some things about the two "it"s:

- They differ trivially.

- They enable new capabilities, such as the ability to explain why a move got made. Current chess AIs are not good at this.

So I think you're making too much of a big deal from a comparative triviality.

[edit]

We might be talking past each other. And some people above have come to doubt the article's results even with the right prompt engineering.

vidarh · on March 17, 2023

The ranking takes into account wins and losses, not illegal moves, and so the fact that it plays in a way where a higher proportion of its losses is down to illegal moves than a human player is not relevant to its ranking. It may suggest that the ranking ought to take that into account, but that's a separate issue.

vidarh · on March 17, 2023

That no human ELO 1400 player will have that rate of illegal moves may be true, but if anything treating the very first illegal move as forfeit appears to be stricter than most rules

arrrg · on March 17, 2023

Does that matter? Seems weird to me to make that argument. I’m honestly quite confused by it.

A bowling bot that threw strikes 9 out of 10 throws and a gutter ball one time out of ten would still be a great bowler even though no human with the ability to make strikes that often would pretty much ever throw a gutter ball.

This is a weird kind of alien intelligence that does not have to behave like humans.

TheRealPomax · on March 17, 2023

Note that the claim is not that it's an ELO 1400 human equivalent player but that it can play chess at a level that gives it an ELO of 1400, which is not nitpicking: that's a completely different thing. We're not testing whether it plays like a player with ELO x, we're proving that "it can't play chess" is fallacious. It can, and when prompted properly, it can achieve an ELO of 1400.

ELO allows for illegal moves: as per the rules of chess, you lose the game if you make an illegal move. The end, ELO doesn't care about why you lost a game on purpose.

jart · on March 17, 2023

I personally find that makes it more astonishing, that it would slip up on knowing the most basic elements of the game, yet still be able to play better than most humans. Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them. But that usually doesn't stop smart people from having an impact in making a contribution with their insights. The question of illegal moves is superficial, since most online systems have guardrails in place that prevent them. At worst it's just an embarrassment and I don't think machines care about being embarrassed.

Jensson · on March 17, 2023

> Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them

This is the opposite of that, a highly trained but dumb entity that has seen many lifetimes worth of games but is still tripping up on basics. But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.

ogogmad · on March 17, 2023

> But since it is so highly trained you can mistake it for a master if you squint and don't look into what it is doing.

But it is a master, as has been pointed out repeatedly. If you replace all illegal moves with resignations, and use the same style of prompt as the OP did, then it plays like an expert. I'm objecting because you're making it sound like it's a trivial result.

Jensson · on March 17, 2023

> you're making this sound like it's a trivial result

I don't think this is a trivial result, emulating a highly trained idiot is still very impressive. But it is very different from an untrained genius.

ogogmad · on March 17, 2023

You seem to have very rigid and boring definitions of the words "idiot" and "genius". The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

Tbh, I don't even know what you're saying.

[edit] OK, I might have misunderstood you. It's not always clear what people mean.

Jensson · on March 17, 2023

> The "AI effect" is real: https://en.wikipedia.org/wiki/AI_effect

That isn't relevant to my comment, an idiot human is still a human. Your comment here therefore doesn't make sense. The comment I responded to likened it to a genius entering a new field, I objected to that, that is all.

charcircuit · on March 17, 2023

ELO is based off who you win and lose against. The rate of illegal moves has nothing to do with ELO.

Pxtl · on March 17, 2023

I'd be interested if it could be coaxed into legal moves after making an illegal one. "That is an illegal move. Can you do something legal with this board?"

saghm · on March 17, 2023

So it sounds like it can play _some_ legal chess games, but not all; it's unable to consistently complete a game where it loses. Maybe the remaining work shouldn't be focused on trying to teach it chess rules better, but to teach it sportsmanship better. People were so excited about teaching it high-school level academics that we forgot to teach it the basic lessons we learn in kindergarten.

vidarh · on March 17, 2023

It seems like it plays mostly legal chess games, when not explicitly reminded of the rules. There's no problem of sportsmanship when it makes mistakes in a game it has not been verified to understand the rules of.

saghm · on March 17, 2023

I was responding to the conclusion from TFA quoted by the parent comment, that playing an illegal move was it saying "this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” If you reject that premise, then yes, my response to it will not be particularly relevant to your worldview.

vidarh · on March 17, 2023

Playing illegal moves is accounted for in rules. Depending on which rules you play by it can be an immediate forfeit, or involves redoing moves and adding time for the opponent, possibly with forfeit if repeated. As such, the article opted for one of the strictest possible rule sets. You can reject the interpretation he gave, and the outcome under those rules would still be the same. If you were to pick a more lenient ruleset, it's possible it would've come out with an even higher ranking.

kmeisthax · on March 17, 2023

Or append "If you wish to resign or you cannot think of a legal move, type 'resign'" to the end of the prompt.

saghm · on March 17, 2023

That's basically my point; that sort of context is exactly the sort of thing you would not need to say to a person who grew up in a typical social environment. If we focus too much on teaching AI technical skills, we might later find out that some of the social skills we think of as implicit were just as important.

jmull · on March 17, 2023

The article also says in one game chatgpt when crazy so they continued the game with a fresh chat. That probably should have been counted a resignation loss too.

nextaccountic · on March 17, 2023

> So whenever it wanted to make an illegal move, it resigned.

Making an illegal move counts as losing by the laws of chess, so this is essentially correct

dudeinjapan · on March 17, 2023

Obviously the article should be taken with a giant grain of salt. That being said, not many things what aren't designed to play chess can play chess, with or without coaxing. My dog cannot, for instance, nor can my coffee table.

hectorlorenzo · on March 17, 2023

> My dog cannot, for instance, nor can my coffee table.

You must be giving them the wrong prompts.

ogogmad · on March 17, 2023

[redacted]

AndrewPGameDev · on March 17, 2023

It's a joke

ballenf · on March 17, 2023

The illegal moves were counted as losses/resignations, not ignored.

__s · on March 17, 2023

> So whenever it wanted to make an illegal move, it resigned.

Doesn't sound like ignoring the cases where it failed

Waterluvian · on March 17, 2023

I’m going to float something ridiculous:

An illegal move is a valid play. You might not get caught. I think there are some Magnus games where illegal moves went overlooked and impacted the game.

You could interpret this as “ChatGPT wants to cheat sometimes.” But I personally interpret it as “ChatGPT doesn’t understand what it’s doing. It’s just a really really good simulacrum.”

hgsgm · on March 17, 2023

Is this the top comment (and not even grey) because more people failed to read the article than read it?

whimsicalism · on March 17, 2023

A baffling thread.

They quoted the article, so clearly they read it... but not very well?

sebzim4500 · on March 17, 2023

It does seem that way.

whimsicalism · on March 17, 2023

I'm confused. If you read the article, you know that you are wrong - but you are quoting the article?

psychphysic · on March 17, 2023

That's how one uses any tool.

qwytw · on March 17, 2023

The behavior of pretty much every other tool is much easier to interpret though.

kdmccormick · on March 17, 2023

If the title of the article was:

> A trivial wrapper around ChatGPT has a Chess Elo of 1400

would you have any issue?

Afaict, the thesis is the article is not "ChatGPT is the ideal tool for playing AI chess", but "it is interesting how well ChatGPT can play chess with some very simple tweaks."

Out_of_Characte · on March 17, 2023

Yes, but it also completely invalidates the measurement of a 1400 elo rating. By comparison, any player making an illegal move is forfeiting the game, almost all people from ~300 elo can play without making illegal moves, chatgpt cant.

ncallaway · on March 17, 2023

> almost all people from ~300 elo can play without making illegal moves

I don't believe you. Are you giving those people a restricted move set (i.e. computer chess, where it will _only_ allow legal moves)? Because if you give people an unrestricted board, I _guarantee_ you people will make lots of illegal moves.

Me: Moves pawn

Opponent: You can't do that, you exposed your king to check.

Me: Oops, sorry, you're right.

nsxwolf · on March 17, 2023

Why do illegal moves forfeit? In online play, they're validated. You can't make illegal moves. What's the ELO score if ChatGPT is corrected, and chooses a new move?

hgsgm · on March 17, 2023

All this above, and people are claiming that ChatGPT lacks human level comprehension of the text it consumes.

In Chess.com, you absolutely can attempt an illegal move, and many players do, and you will not get punished for it, so chatgpt is better then a 1400 human player.

sebzim4500 · on March 17, 2023

ChatGPT did forfeit whenever it made an illegal move, read the article.

swatcoder · on March 17, 2023

No, the writer arbitrarily decided to interpret illegal moves as resignations in order to support the conclusion they wanted. That's very different and grossly unscientific.

mynameisvlad · on March 17, 2023

I mean, that's more lenient than the official "interpretation" (rule) which is that your second illegal move results in a forfeit.

epups · on March 17, 2023

This is not a scientific paper, and I at least find this decision justified, as he could have been more lenient and grab headlines with a bigger ELO.

renewiltord · on March 17, 2023

The article:

> So whenever it wanted to make an illegal move, it resigned.

You:

> By comparison, any player making an illegal move is forfeiting the game...

By comparison indeed.