I'm always a bit confused when people say things like this. 60k token is often more than the initial context I feed the model with. And I don't think I ever had a productive session that began under 150k tokens.
Bit of what makes it so fun, our experiences seem to wildly differ! On one hand, you have experiences like yours, but then my own experience is that I never had a productive session when the scope grows beyond 150K tokens! If I needed 60K just as a starting context, I'd take that to mean the suggested change is way to large, and if the model cannot solve the entire thing within maybe 15-20% of the total context size, divide and conquer is needed otherwise there will be a lot of time wasted to patch things up when things are "completed".
Yeah indeed it's very interesting. And the 60k initial context don't even contain the suggested change yet. For me if I don't do this the current models tend to fixate and local patches instead of tracing symbols and making a holistic model of what a change interacts with in the codebase
Nothing is perfectly secure on its own. No system designed or checked by humans ever will be. After all, the Xbox One was indeed pwned, relatively recently. However, because the juice wasn't worth the squeeze for so long, it got pwned years after it was a relevant, money making console.
Novel jailbreaks for ancient iPhones are not worth much. But attention on current, brand new devices means increasing the danger that a mistake gets found, which increases the odds that that mistake is found by someone who wants to sell it for the most money. Also, from Apple's perspective a zero-day in the bootloader on macOS also means a zero-day in the bootloader in all of the billions of iOS devices out there too. They do not want to give anyone anymore reason than what already exists to try and pwn LLB or iBoot. Given a happy path, all of that hacker energy for "put Linux on my M1 Macbook" is put towards device drivers and support, rather than "how the hell do we get an alternative kernel booting on this thing".
Fewer bullets pinging off the armor. Fewer cracks in the fuselage forming. Fewer knives to dodge. All of it means Apple's boot process for their current devices are less likely to be pwned before they turn into e-waste (whenever that is, not making a comment on Apple's perhaps accelerated or otherwise practices in obsolescence).
Just like a jetliner will eventually succumb to entropy and become dangerous to fly, so too will a lot of "secure" software. You only need to actively maintain a jetliner while it is flying passengers or cargo. Once it is retired, it can rot, people can break into the husk of it at the junkyard and fornicate and smoke crack and smash windows and steal parts of the fuselage. At that point, who cares?
Well thought out commentary... Let's dig deeper and at least we make it more interesting conversation, not a blurb.
Wouldn't it be technically no because Google's revenue isn't 100% from ads? They're making almost $120bn from cloud, subscriptions and devices for example. It could be cloud money. And if Google gets ad money so whatever it pays becomes ad money, then it's ad money all the way down.
FYI last fiscal results from Q1 of Alphabet, Google Cloud made $20bn revenue Q1 2026, up from Q4 2025 of $17bn. It's a bit misleading to include "subscriptions, platforms, and devices" in cloud.
Q1 2026 Google's revenue totalled $109bn, of which $77bn is Ads, so 70% of its revenue is Ads. It's common knowledge that Google is an Ads company.
I googled the money they made from cloud, subscriptions, platforms, and devices, then approximated almost $120bn in a year. The precise number mattered less than the fact that it's a ton of it already, enough to cover a lot of payoffs.
> It's a bit misleading to include
I didn't "include in" anything, it was an enumeration of things that aren't ads. "Google makes $Q from X and Y", not from "X included in Y".
You found something that's technically correct (a clear enumeration and addition) to be misleading. I think you now accidentally understand what was my initial objection. A lot of other people in the thread don't because that's how social media works, they go with the prevailing opinion for the sweet sweet likes, or go against it and get squeezed out.
I think it’s crazy that they do this, especially without any notice. I would not have renewed my subscription if I knew that they started doing this.
Especially in the analysis part of my work I don‘t care about the actual text output itself most of the time but try to make the model „understand“ the topic.
In the first phase the actual text output itself is worthless it just serves as an indicator that the context was processed correctly and the future actual analysis work can depend on it.
And they‘re… just throwing most the relevant stuff out all out without any notice when I resume my session after a few days?
This is insane, Claude literally became useless to me and I didn’t even know it until now, wasting a lot of my time building up good session context.
There would be nothing lost if they said „If you click yes, we will prune your old thinking making Claude faster and saving you tons of tokens“. Most people would say yes probably so why not ask them… make it an env variable (that is announced not a secretly introduced one to opt out of something new!) or at least write it in a change log if they really don’t want to allow people to use it like before, so there‘d be chance to cancel the subscription in time instead of wasting tons of time on work patterns that not longer work
Pointing at their terms of service will definitely be the instantly summoned defense (as would most modern companies) but the fact that SaaS can so suddenly shift the quality of product being delivered for their subscription without clear notification or explicitly re-enrollment is definitely a legal oversight right now and Italy actually did recently clamp down on Netflix doing this[1]. It's hard to define what user expectations of a continuous product are and how companies may have violated it - and for a long time social constructs kept this pretty in check. As obviously inactive and forgotten about subscriptions have become a more significant revenue source for services that agreement has been eroded, though, and the legal system has yet to catch up.
1. Specifically, this suite was about price increases without clear consideration for both parties - but the same justifications apply to service restrictions without corresponding price decreases.
> Our systems will smartly ignore any reasoning items that aren’t relevant to your functions, and only retain those in context that are relevant. You can pass reasoning items from previous responses either using the previous_response_id parameter, or by manually passing in all the output items from a past response into the input of a new one.
So to defend a litte, its a Cache, it has to go somewhere, its a save state of the model's inner workings at the time of the last message. so if it expires, it has to process the whole thing again. most people don't understand that every message the ENTIRE history of the conversion is processed again and again without that cache. That conversion might of hit several gigs worth of model weights and are you expecting them to keep that around for /all/ of your conversions you have had with it in separate sessions?
No? It's not because it's a cache, it's because they're scared of letting you see the thinking trace. If you got the trace you could just send it back in full when it got evicted from the cache. This is how open weight models work.
The issue is that if they send the full trace back, it will have to be processed from the start if the cache expired, and doing that will cause a huge one-time hit against your token limit if the session has grown large.
So what Boris talked about is stripping things out of the trace that goes back to regenerate the session if the cache expires. Doing this would help avert burning up the token limit, but it is technically a different conversation, so if CC chooses poorly on stripping parts of the context then it would lead to Claude getting all scatter-brained.
They literally can. They could make the API free to use if they wanted. There is no law that states that costs have to equal the cost it takes to process the request.
I’m not familiar with the Claude API but OpenAI has an encrypted thking messages option. You get something that you can send back but it is encrypted. Not available on Anthropic?
No of course it’s unrealistic for them to hold the cache indefinitely and that’s not the point. You are keeping the session data yourself so you can continue even after cache expiry. The point I‘m making is that it made me very angry that without any announcement they changed behavior to strip the old thinking even when you have it in your session file. There is absolutely no reason to not ask the user about if they want this
And it’s part of a larger problem of unannounced changes it‘s just like when they introduced adaptive thinking to 4.6 a few weeks ago without notice.
Also they seem to be completely unaware that some users might only use Claude code because they are used to it not stripping thinking in contrast to codex.
Anyway I‘m happy that they saw it as a valid refund reason
It seems like an opportunity for a hierarchical cache. Instead of just nuking all context on eviction, couldn’t there be an L2 cache with a longer eviction time so task switching for an hour doesn’t require a full session replay?
Living where? If it's in the GPU, then it's still taking up precious space that could be used for serving other sessions. If it's not in the GPU, then it doesn't help.
what matters isn't that it's a cache; what matter is it's cached _in the GPU/NPU_ memory and taking up space from another user's active session; to keep that cache in the GPU is a nonstarter for an oversold product. Even putting into cold storage means they still have to load it at the cost of the compute, generally speaking because it again, takes up space from an oversold product.
> There would be nothing lost if they said „If you click yes, we will prune your old thinking making Claude faster and saving you tons of tokens“. Most people would say yes probably so why not ask them
The irony is that Claude Design does this. I did a big test building a design system, and when I came back to it, it had in the chat window "Do you need all this history for your next block of work? Save 120K tokens and start a new chat. Claude will still be able to use the design system." Or words to that effect.
This is exactly what also confused me. I had the exact same prompt in Claude code as well, and the no option implies you can also keep the whole history. But clicking keep apparently only ever kept the user and assistant messages not the whole actual thinking parts of the conversation
I honestly am very disappointed with this. I've only learned about CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING and showThinkingSummaries: true from this post. I've been wondering for a while where the summaries went and am always hoping like roulette that it thinks a lot. No wonder if there suddently is an "adaptive thinking" mode. I would have opted out 2 months ago if it was documented or communicated in any way publicly. Why change behavior without notice or any new user facing settings.
I just googled "CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING" and it seems like many people don't know about it.
And ULTRATHINK sets the effort to high, but then there is also /effort max?
I'm now confused because I used to use ultrathink, went away as well as the chain of reasoning prompts, recently changed to high or extra thinking, now this is back?
reply