Just the other day there was a double-blind study that showed a 50-50 success ra...

tripletao · on May 19, 2024

If you're referring to the study at

https://news.ycombinator.com/item?id=40386571 ,

then it wasn't a canonical Turing test. The preprint accurately describes and analyzes their (indefensibly bad) experiment, but the popular press has mischaracterized it.

The canonical test gives the interrogator two witnesses, one human and one machine, and asks them to judge which witness is human. The interrogator knows that exactly one witness is human. In that test, a 50% chance of a right answer means the machine is indistinguishable from human. (Turing actually proposed a lower pass threshold, perhaps for statistical convenience.)

But that study gave the interrogator one witness, and asked them to judge whether it was human. The interrogator wasn't told anything about the prior probability that their witness was human. The probabilities that a real human is judged human and that GPT-4 is judged human sum to >100%, since nothing stops that since it's not a binary comparison. So 50% has no particular meaning. The result is effectively impossible to interpret, since it's a function both of the witness's performance and of whatever assumption the interrogator makes about the unspecified prior.

golol · on May 19, 2024

I a 5 minute casual conversation. Also the statistics between human and AI were different in some regard (like 48% vs 56% for some quantity), I dont recall details.

Look the Turing test is very different depending on the details, and I think a lame 5min Turing test that doesnt really measure anything of i terest is a wirse concept than a 1 day adversarial expert team test thqt can detect AGI.

golol · on May 19, 2024

So why can't you replace 99% of callcenter calls (<5min) with AI right now?

pas · on May 19, 2024

you don't know which calls are going to be those trivial ones upfront.

that said, support is being replaced by nothing in a lot of places. (oh, sometimes there's an annoying chatbot.)