Comment by blibble

Comment by blibble a year ago

> If you want to protect your content, use the technical mechanisms that are available,

> You can choose to gatekeep your content, and by doing so, make it unscrapeable, and legally protected.

so... robots.txt, which the AI parasites ignore?

> Also, consider that relatively small, cheap llms are able to parse the difference between meaningful content and Markovian jabber such as this software produces.

okay, so it's not damaging, and there you've refuted your entire argument

observationist a year ago

[flagged]

Reply View 4 replies

jsheard a year ago

> No, put up a loginwall or paywall, authenticate users, and go private.
We know for a fact that AI companies don't respect that, if they want data that's behind a paywall then they'll jump through hoops to take it anyway.
https://www.theguardian.com/technology/2025/jan/10/mark-zuck...
If they don't have to abide by "norms" then we don't have to for their sake. Fuck 'em.

Reply View | 3 replies
- observationist a year ago
  
  [flagged]
  
  Reply View | 2 replies
  
  tir a year ago
  
  >the law explicitly allows scraping and crawling.
  Nepenthes also allows scraping and crawling, for as long as you like.
  
  Reply View | 0 replies
  
  blibble a year ago
  
  this is a very US-ian view of the world
  my site is not in the US, I am not a US citizen. US law does not apply to me.
  under UK law: robots.txt is an access control mechanism (weak or otherwise)
  knowingly bypassing it is likely a criminal offence under the Computer Misuse Act
  good luck suing me because you got stuck when you smashed my window and climbed through it
  
  Reply View | 0 replies