Anthropic always summarizes the reasoning output to prevent some distillation at...

jdiff · 2026-04-16T18:13:11 1776363191

Genuine question, why have you chosen to phrase this scraping and distillation as an attack? I'm imagining you're doing it because that's how Anthropic prefers to frame it, but isn't scraping and distillation, with some minor shuffling of semantics, exactly what Anthropic and co did to obtain their own position? And would it be valid to interpret that as an attack as well?

DrammBA · 2026-04-16T19:01:55 1776366115

> I'm imagining you're doing it because that's how Anthropic prefers to frame it

Correct.

> would it be valid to interpret that as an attack as well?

Yup.

irthomasthomas · 2026-04-16T18:22:45 1776363765

If you ask claude in chinese it thinks its deepseek.

typ · 2026-04-17T02:22:38 1776392558

I don't think that learning from textbooks to take an exam and learning from the answers of another student taking the exam are the same.

Joking aside, I also don't believe that maximum access to raw Internet data and its quantity is why some models are doing better than Google. It seems that these SoTA models gain more power from synthetic data and how they discard garbage.

fragmede · 2026-04-16T20:50:58 1776372658

Firehosing Anthropic to exfiltrate their model seems materially different than Anthropic downloading all of the Internet to create the model in the first place to me. But maybe that's just me?

jdiff · 2026-04-16T22:34:44 1776378884

I don't see the material difference in firehosing anthropic vs anthropic firehosing random sites on the internet. As someone who runs a few of those random sites, I've had to take actions that increase my costs (and burn my time) to mitigate a new host of scrapers constantly firing at every available endpoint, even ones specifically marked as off limits.

robrenaud · 2026-04-16T21:48:24 1776376104

Yeah, it's different. Anthropic profits when it delivers tokens. Hosting providers pay when Anthropic scrapes them.

59nadir · 2026-04-17T00:25:57 1776385557

Yes, what the LLM providers did was worse and impacted people financially a whole lot more in lost compensation for works as well as operational costs that would never reach the heights they did solely because of scrapers on behalf of model providers.

vintermann · 2026-04-16T17:37:57 1776361077

Attacks? That's a choice of words.

DrammBA · 2026-04-16T17:45:06 1776361506

Definitely Anthropic playing the victim after distilling the whole internet.

butlike · 2026-04-16T19:03:06 1776366186

Proprietary pattern matcher proves there's no moat; promptly pre-covers other's perception.

nyc_data_geek1 · 2026-04-16T17:17:58 1776359878

Very cool that these companies can scrape basically all extant human knowledge, utterly disregard IP/copyright/etc, and they cry foul when the tables turn.

stavros · 2026-04-16T17:39:31 1776361171

Yep, that is exactly what happens. It's a disgrace that their models aren't open, after training on everything humanity has preserved.

They should at least release the weights of their old/deprecated models, but no, that would be losing money.

copperx · 2026-04-16T21:22:46 1776374566

We should treat LLM somewhat like patents or drugs. After 5 years or so, the models should become open source. Or at very least the weights. To compensate for the distilling of human knowledge.

butlike · 2026-04-16T19:04:25 1776366265

All extant human knowledge SO FAR. Remember, by the nature of the beast, the companies will always be operating in hindsight with outdated human knowledge.

MasterScrat · 2026-04-16T17:19:51 1776359991

and so does OpenAI