robots.txt and archiveteam.org...

Sat Jul 6 14:06:35 CEST 2019

On 7/5/19 10:13 AM, Wiktor Kwapisiewicz via Gnupg-users - 
gnupg-users at gnupg.org wrote:
> 
> As for robots.txt not all archiving sites respect it:
> https://www.archiveteam.org/index.php?title=Robots.txt
> 
Thanks for posting the link. To quote from the text there:

 > What this situation does, in fact, is cause many more problems than 
it solves - catastrophic failures on a website are ensured total 
destruction with the addition of ROBOTS.TXT. Modifications, poor choices 
in URL transition, and all other sorts of management work can lead to a 
loss of historically important and relevant data. Unchecked, and left 
alone, the ROBOTS.TXT file ensures no mirroring or reference for items 
that may have general use and meaning beyond the website's context.
  This is both stupid and arrogant. It is precisely the owner of the
website and data contain therein to decide what is and what isn't of
"general use and meaning beyond the website's context", not of some 
aggregator/archiver's management.

GDPR has indeed changed the nature of Internet forever, and it is for
the better. If Google was put in its place (well, at least first steps
have been made..) by the EU, surely it will be possible to force other,
lesser operators of "archived information" to toe the line. Among other,
to respect the straight and simple Robot Exclusion Protocol. It is not
at all something difficult to do.