Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone who spent years quadruple checking every figure in every slide for years to avoid a mistake like this, it’s very confusing to see this out of the big launch announcement of one of the most high profile startups around.

Even the small presentations we gave to execs or the board were checked for errors so many times that nothing could possibly slip through.



It's literally a billion dollar plus release. I get more scrutiny on my presentations to groups of 10 people.


I take a strange comfort in still spotting AI typos. Makes it obvious their shiny new "toy" isn't ready to replace professionals.

They talk about using this to help families facing a cancer diagnosis -- literal life or death! -- and we're supposed to trust a machine that can't even spot a few simple typos? Ha.

The lack of human proofreading says more about their values than their capabilities. They don't want oversight -- especially not from human professionals.


Cynically, the AI is ready to replace professionals, in areas where the stakeholders don't care too much. They can offer the services cheaper, and this is all that matters to their customers. Were it not so, companies like Tata won't have any customers. The phenomenon of "cheap Chinese junk" would not exist, because no retailer would order to produce it.

So, brace yourselves, we'll see more of this in production :(


Does something where you don't care about quality this much need doing at all?


Well, the world will split into those who care, and fields where precision is crucial, and the rest. Occasional mistakes are tolerable but systematic bullshit is a bit too much for me.


This separation (always a spectrum, not a split) already exists for a long time. Bouts of systemic bullshit occur every now and then, known as "bubbles" (as in dotcom bubble, mortgage bubble, etc) or "crises" (such as "reproducibility crisis", etc). Smaller waves rise and fall all the time, in the form of various scams (from the ancient tulip mania to Ponzi to Madoff to ICOs, etc).

It seems like large amounts of people, including people at high-up positions, tend to believe bullshit, as long as it makes them feel comfortable. This leads to various irrational business fashions and technological fads, to say nothing of political movements.

So yes, another wave of fashion, another miracle that works "as everybody knows" would fit right in. It's sad because bubbles inevitably burst, and that may slow down or even destroy some of the good parts, the real advances that ML is bringing.


Yes this is quite shocking. They could have just had o3 fact check the slides and it would have noticed...


I thought so too, but I gave it a screenshot with the prompt:

> good plot for my presentation?

and it didn't pick up on the issue. Part of its response was:

> Clear metric: Y-axis (“Accuracy (%), pass @1”) and numeric labels make the performance gaps explicit.

I think visual reasoning is still pretty far from text-only reasoning.


o3 did fact check the slides and it fixed its lower score.


They let the AI make the bars.


Vibegraphing.


Stable diffusion is good for this!


and then check.


Well, clearly they didn’t


Probably generated with GPT-5


The needle now presses a little deeper into the bubble.


I think this just further demonstrates the truth behind the truly small & scrappy teams culture at OpenAI that an ex-employee recently shared [1].

Even with the way the presenters talk, you can sort of see that OAI prioritizes speed above most other things, and a naive observer might think they are testing things a million different ways before releasing, but actually, they're not.

If we draw up a 2x2 for Danger (High/Low) versus Publicity (High/Low), it seems to me that OpenAI sure has a lot of hits in the Low-Danger High-Publicity quadrant, but probably also a good number in the High-Danger Low-Publicity quadrant -- extrapolating purely from the sheer capability of these models and the continuing ability of researchers like Pliny to crack through it still.

[1] https://calv.info/openai-reflections


I don't think they give a shit. This is a sales presentation to the general public and the correct data is there. If one is pedantic enough they can see the correct number, if not it sells well. If they really cared grok etc. Would be on there too.


The opposite view is to show your execs the middle finger on nitpicking. Their product is definitely not more important than ChatGPT-5. So your typo does not matter. It didn't ever matter.


It is not mistake. It is common tactic to make illusion of improvement.


Would they risk such an obvious blunder and being ridiculed for being "AI-sloppy"? I don't believe it.


I don’t believe for mistake either. As others have said, these graphs are worth of billions. Everything is calculated. They take the risk that some will notice but most will not. They say that it is mistake for those who notice.


Perhaps they're taking a leaf from nvidias book - influencers dunking on their bar charts gives a lot of free press coverage/mindshare


I've seen that sentiment on reddit as well and I can't phantom how you think it being on purpose is more likely than a mistake when

1 - The error is so blatantly large

2 - There is a graph without error right next to it

3 - The errors are not there in the system card and the presentation page


Not sure what to think anymore https://www.vibechart.net/


It wouldnt have taken years of quadruple checks to spot that one.


Possibly they rushed to bring forward the release annoucement


It's not a mistake. It's meant to misled.


Humans hallucinate output all the time.


Not as much as current llms. But the point is that AIs are supposed to be better than us, kind of how people built calculators to be more reliable than the average person and faster than anyone.


I'm just going to wildly speculate.

1. They had many teams who had to put their things on a shared Google Sheets or similar

2. They used placeholders to prevent leaks

2.a. Some teams put their content just-in-time

3. The person running the presentation started the presentation view once they had set up video etc. just before launching stream

4. Other teams corrected their content

5. The presentation view being started means that only the ones in 2.a were correct.

Now we wait to see.


6. (Occam's Razor) It just didn't perform that well in trials for that specific eval.


That is obviously wrong since the numbers are right but the graph is wrong and you can see it correct on the website…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: