Comment by neilv
Comment by neilv 13 hours ago
Can demonstrable ignoring of robots.txt help the cases of copyright infringement lawsuits against the "AI" companies, their partners, and customers?
Comment by neilv 13 hours ago
Can demonstrable ignoring of robots.txt help the cases of copyright infringement lawsuits against the "AI" companies, their partners, and customers?
Big thing worth asking here. Depending on what 'amazon' means here (i.e. known to be Amazon specific IPs vs Cloud IPs) it could just be someone running a crawler on AWS.
Or, folks failing the 'shared security model' of AWS and their stuff is compromised with botnets running on AWS.
Or, folks that are quasi-spoofing 'AmazonBot' because they think it will have a better not-block rate than anonymous or other requests...
In the UK, the Computer Misuse Act applies if:
* There is knowledge that the intended access was unauthorised
* There is an intention to secure access to any program or data held in a computer
I imagine US law has similar definitions of unauthorized access?
`robots.txt` is the universal standard for defining what is unauthorised access for bots. No programmer could argue they aren't aware of this, and ignoring it, for me personally, is enough to show knowledge that the intended access was unauthorised. Is that enough for a court? Not a goddamn clue. Maybe we need to find out.
robots.txt isn't a standard. It is a suggestion, and not legally binding AFAIK. In US law at least a bot scraping a site doesn't involve a human being and therefore the TOS do not constitute a contract. According to the Robotstxt organization itself: “There is no law stating that /robots.txt must be obeyed, nor does it constitute a binding contract between site owner and user, but having a /robots.txt can be relevant in legal cases.”
The last part basically means the robots.txt file can be circumstantial evidence of intent, but there needs to be other factors at the heart of the case.
I wind up in jail for ten years if I download an episode of iCarly; Sam Altman inhales every last byte on the internet and gets a ticker tape parade. Make it make sense.
Probably not copyright infringement. But it is probably (hopefully?) a violation of CFAA, both because it is effectively DDoSing you, and they are ignoring robots.txt.
Maybe worth contacting law enforcement?
Although it might not actually be Amazon.