Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They are right and you are wrong. If some web page is publicly available, it should be indexed. Scraping neutrality, please.


Heavily disagree. I own the server, thus the website. I should be able to allow or disallow any type of web crawler/scraper i want. Similar to how you cant easily regulate whats in a website without lawsuits and takedowns, you cant regulate how discoverable a website is.


> I should be able to allow or disallow any type of web crawler/scraper i want.

You're certainly allowed to try, but I don't see why indexers should be mandated to collaborate with you. They serve their users, not you.


Will their users appreciate that they disregard the intent of the authors of what they index?

I mean, "allow" or "regulate" don't _really_ apply here - there was never any enforcement regime around robots.txt, just a convention based on the general expectation that you don't claim ownership of whatever passes your line of sight.


What if I want what I publish to be known only by word of mouth?

What if I consider (some or any of) my ideas to be un-indexable, not directly suitable to representation in any hierarchy other than those I may set them in?


Then you should hide them behind a url that isn't linked elsewhere on your site that you can easily propagate by word of mouth only.

    example.com/correcthorsebatterystaple
If you consider "word of mouth" to be public posts on a forum which millions can read at any time then block googlebot IP's


...after decades of what i considered friendship, here you are on hn talking about my horse battery staple


Yes, sorry, it was a rhetorical question in response to previous.

Taking either step you suggest (along with robots.txt or eqiv.), it would seem fair to expect that Brave, Bing, whomever, would not feel it their neutral/natural domain to include in a public index.


Then dont publish it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: