Comment by uqers

Comment by uqers 2 days ago

> Unfortunately, the price LLM companies would have to pay to scrape every single Anubis deployment out there is approximately $0.00.

The math on the site linked here as a source for this claim is incorrect. The author of that site assumes that scrapers will keep track of the access tokens for a week, but most internet-wide scrapers don't do so. The whole purpose of Anubis is to be expensive for bots that repeatedly request the same site multiple times a second.

drum55 2 days ago

The "cost" of executing the JavaScript proof of work is fairly irrelevant, the whole concept just doesn't make sense with a pessimistic inspection. Anubis requires the users to do an irrelevant amount of sha256 hashes in slow javascript, where a scraper can do it much faster in native code; simply game over. It's the same reason we don't use hashcash for email, the amount of proof of work a user will tolerate is much lower than the amount a professional can apply. If this tool provides any benefit, it's due to it being obscure and non standard.

When reviewing it I noticed that the author carried the common misunderstanding that "difficulty" in proof of work is simply the number of leading zero bytes in a hash, which limits the granularity to powers of two. I realize that some of this is the cost of working in JavaScript, but the hottest code path seems to be written extremely inefficiently.

    for (; ;) {
        const hashBuffer = await calculateSHA256(data + nonce);
        const hashArray = new Uint8Array(hashBuffer);

        let isValid = true;
        for (let i = 0; i < requiredZeroBytes; i++) {
          if (hashArray[i] !== 0) {
            isValid = false;
            break;
          }
        }

It wouldn’t be exaggerating to say that a native implementation of this with even a hair of optimization could reduce the “proof of work” to being less time intensive than the ssl handshake.

Reply View 4 replies

jsnell 2 days ago

That is not a productive way of thinking about it, because it will lead you to the conclusion that all you need is a smarter proof of work algorithm. One that's GPU-resistant, ASIC-resistant, and native code resistant. That's not the case.
Proof of work can't function as a counter-abuse challenge even if you assume that the attackers have no advantage over the legitimate users (e.g. both are running exactly the same JS implementation of the challenge). The economics just can't work. The core problem is that the attackers pay in CPU time, which is fungible and incredibly cheap, while the real users pay in user-observable latency which is hellishly expensive.

Reply View | 0 replies
aniviacat 2 days ago

They do use SubtleCrypto digest [0] in secure contexts, which does the hashing natively.
Specifically for Firefox [1] they switch to the JavaScript fallback because that's actually faster [2] (because of overhead probably):
> One of the biggest sources of lag in Firefox has been eliminated: the use of WebCrypto. Now whenever Anubis detects the client is using Firefox (or Pale Moon), it will swap over to a pure-JS implementation of SHA-256 for speed.
[0] https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypt...
[1] https://github.com/TecharoHQ/anubis/blob/main/web/js/algorit...
[2] https://github.com/TecharoHQ/anubis/releases/tag/v1.22.0

Reply View | 0 replies
xena 2 days ago

If you can optimize it, I would love that as a pull request! I am not a JS expert.

Reply View | 0 replies
gruez 2 days ago

>but the hottest code path seems to be written extremely inefficiently.
Why is this inefficient?

Reply View | 0 replies

tptacek 2 days ago

Right, but that's the point. It's not that the idea is bad. It's that PoW is the wrong fit for it. Internet-wide scrapers don't keep state? Ok, then force clients to do something that requires keeping state. You don't need to grind SHA2 puzzles to do that; you don't need to grind anything at all.

Reply View 0 replies

[removed] 2 days ago

[deleted]

Reply View 0 replies

valicord 2 days ago

The point is that the scrapers can easily bypass this if they cared to do so

Reply View 5 replies

uqers 2 days ago

How so?

Reply View | 4 replies
- valicord 2 days ago
  
  The parent comment was "The author of that site assumes that scrapers will keep track of the access tokens for a week, but most internet-wide scrapers don't do so.". There's no technical reason why they wouldn't reuse those tokens, they don't do that today because they don't care. If anubis gets enough adoption to cause meaningful inconvenience, the scrapers would just start caching the tokens to amortize the cost.
  The point of the article is that if the scraper is sufficiently motivated, Anubis is not going to do much anyway, and if the scraper doesn't care, same result can be achieved without annoying your actual users.
  
  Reply View | 0 replies
- tecoholic 2 days ago
  
  Hmm… by setting the verified=1 cookie on every request to the website?
  Am I missing something here? All this does is set an unencrypted cookie and reload the page right?
  
  Reply View | 2 replies
  
  notpushkin 2 days ago
  
  They could, but if this is slightly different from site to site, they’ll have to either do this for every site (annoying but possible if your site is important enough), or go ahead and run JS (which... I thought they do already, with plenty of sites still being SPAs?)
  
  Reply View | 1 reply
  
  rezonant 2 days ago
  
  I would be highly surprised if most of these bots are already running JavaScript, I'm confused by this unquestioned notion that they don't.
  
  Reply View | 0 replies