I'm surprised to no longer see Opus 4.6 on Cursorbench. I think there is a subset of Claude fans that are still adamant that Opus 4.6 is the best version.
"A grimoire is a textbook of magic and sorcery. Traditionally, it contains instructions for casting spells, performing divination, creating magical objects like talismans, and summoning supernatural entities such as angels or spirits."
It's how they name classes of models, presumably this implies something about the relative quantization / size of model, not about the specific performance. E.g. Fabel 5 will be better than Opus 5, better than Sonnet 5, etc. The 5 is the version number of the particular iteration / training run at this class of model.
It looks like they are using the "agentic AI era" as an excuse to restructure in order to boost margins. GAAP gross margin dropped ~5 points YoY (76% -> 71%)
Whatever the play here they can’t be angling for any external PR or internal morale boost. What if they wrote: “This is a tough economy and we have to tighten our belts.” Maybe that’s naive of me. Bad signal to investors as opposed to insignificant employees and commoners (PR)?
But contrast with this:
> The way we work at Cloudflare has fundamentally changed. We don’t just build and sell AI tools and platforms. We are our own most demanding customer. Cloudflare’s usage of AI has increased by more than 600% in the last three months alone. Employees across the company from engineering to HR to finance to marketing run thousands of AI agent sessions each day to get their work done. That means we have to be intentional in how we architect our company for the agentic AI era in order to supercharge the value we deliver to our customers and to honor our mission to help build a better Internet for everyone, everywhere.
What is this even saying? We use a lot of AI. And not just for other people... for ourselves. This means that: we need to be intentional?
What is a regular, not-investor, person supposed to glean from this? We’ve hit the automation jackpot: some of you will be fired, some of you will get more work for the same pay?[1] Along with shoving your face with euphoric buzzwords “AI era”, “supercharge the value”.
I must surmise that whatever PR and internal morale blow (?) matters so little to them. They are not at all afraid of any backlash from any lowly people.
[1] Again. This paragraph isn’t saying anything beyond that they are using AI and ho-ho things are a-changing. So one has to guess.
I have plans to publish the problems, not any plans to publish how well the LLMs perform on them. The standard for publishing benchmarks is very high, and I'm really just posting vibes here. Still, I hope my experiences are useful to some people, as others experiences have been useful to me.
I assume the poisoner community is mirroring and likely remixing the content from there. The whole effort isn’t going to work with a single point of failure like that.
reply