Wget robots.txt [Mechanical engineering]

Wget robots.txt

wget not respecting my robots.txt. Is there an interceptor?

I have a website where I post csv files as a free service. Recently I have noticed that wget and libwww have been scraping pretty hard and I was wondering how to circumvent that even if only a little.

I have implemented a robots.txt policy. I posted it below..

User-agent: wget Disallow: / User-agent: libwww Disallow: / User-agent: * Disallow: /

Issuing a wget from my totally independent ubuntu box shows that wget against my server just doesn't seem to work like so.

Anyway I don't mind people just grabbing the info, I just want to implement some sort of flood control, like a wrapper or an interceptor.

Does anyone have a thought about this or could point me in the direction of a resource. I realize that it might not even be possible. Just after some ideas.



You might also like
Super Robot Wars OG Divine Wars 3
Super Robot Wars OG Divine Wars 3
RC Combat Robot Wars - All Battles for Q6 - 2015 RC World
RC Combat Robot Wars - All Battles for Q6 - 2015 RC World ...
Robot Wars
Robot Wars
Watch Dogs PS4 Gameplay Walkthrough Part 27 "Robot Wars"
Watch Dogs PS4 Gameplay Walkthrough Part 27 "Robot Wars"

FAQ

avatar
How do I get wget to download a cgi file behind robots.txt?

From the wget manual on gnu.org

Related Posts