Comment by Philpax

Comment by Philpax 4 days ago

The argument isn't that it's difficult for them to circumvent - it's not - but that it adds enough friction to force them to rethink how they're scraping at scale and/or self-throttle.

I personally don't care about the act of scraping itself, but the volume of scraping traffic has forced administrators' hands here. I suspect we'd be seeing far fewer deployments if the scrapers behaved themselves to begin with.

davidclark 4 days ago

The OP author shows that the cost to scrape an Anubis site is essentially zero since it is a fairly simple PoW algorithm that the scraper can easily solve. It adds basically no compute time or cost for a crawler run out of a data center. How does that force rethinking?

Reply View 3 replies

Philpax 4 days ago

The cookie will be invalidated if shared between IPs, and it's my understanding that most Anubis deployments are paired with per-IP rate limits, which should reduce the amount of overall volume by limiting how many independent requests can be made at any given time.
That being said, I agree with you that there are ways around this for a dedicated adversary, and that it's unlikely to be a long-term solution as-is. My hope is that the act of having to circumvent Anubis at scale will prompt some introspection (do you really need to be rescraping every website constantly?), but that's hopeful thinking.

Reply View | 1 reply
- yborg 4 days ago
  
  >do you really need to be rescraping every website constantly Yes, because if you believe you out-resource your competition, by doing this you deny them training material.
  
  Reply View | 0 replies
hooverd 4 days ago

The problem with crawlers if that they're functionally indistinguishable from your average malware botnet in behavior. If you saw a bunch of traffic from residential IPs using the same token that's a big tell.

Reply View | 0 replies