Comment by uqers

Comment by uqers 2 days ago

13 replies

> Unfortunately, the price LLM companies would have to pay to scrape every single Anubis deployment out there is approximately $0.00.

The math on the site linked here as a source for this claim is incorrect. The author of that site assumes that scrapers will keep track of the access tokens for a week, but most internet-wide scrapers don't do so. The whole purpose of Anubis is to be expensive for bots that repeatedly request the same site multiple times a second.

drum55 2 days ago

The "cost" of executing the JavaScript proof of work is fairly irrelevant, the whole concept just doesn't make sense with a pessimistic inspection. Anubis requires the users to do an irrelevant amount of sha256 hashes in slow javascript, where a scraper can do it much faster in native code; simply game over. It's the same reason we don't use hashcash for email, the amount of proof of work a user will tolerate is much lower than the amount a professional can apply. If this tool provides any benefit, it's due to it being obscure and non standard.

When reviewing it I noticed that the author carried the common misunderstanding that "difficulty" in proof of work is simply the number of leading zero bytes in a hash, which limits the granularity to powers of two. I realize that some of this is the cost of working in JavaScript, but the hottest code path seems to be written extremely inefficiently.

    for (; ;) {
        const hashBuffer = await calculateSHA256(data + nonce);
        const hashArray = new Uint8Array(hashBuffer);

        let isValid = true;
        for (let i = 0; i < requiredZeroBytes; i++) {
          if (hashArray[i] !== 0) {
            isValid = false;
            break;
          }
        }
It wouldn’t be exaggerating to say that a native implementation of this with even a hair of optimization could reduce the “proof of work” to being less time intensive than the ssl handshake.
  • jsnell 2 days ago

    That is not a productive way of thinking about it, because it will lead you to the conclusion that all you need is a smarter proof of work algorithm. One that's GPU-resistant, ASIC-resistant, and native code resistant. That's not the case.

    Proof of work can't function as a counter-abuse challenge even if you assume that the attackers have no advantage over the legitimate users (e.g. both are running exactly the same JS implementation of the challenge). The economics just can't work. The core problem is that the attackers pay in CPU time, which is fungible and incredibly cheap, while the real users pay in user-observable latency which is hellishly expensive.

  • aniviacat 2 days ago

    They do use SubtleCrypto digest [0] in secure contexts, which does the hashing natively.

    Specifically for Firefox [1] they switch to the JavaScript fallback because that's actually faster [2] (because of overhead probably):

    > One of the biggest sources of lag in Firefox has been eliminated: the use of WebCrypto. Now whenever Anubis detects the client is using Firefox (or Pale Moon), it will swap over to a pure-JS implementation of SHA-256 for speed.

    [0] https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypt...

    [1] https://github.com/TecharoHQ/anubis/blob/main/web/js/algorit...

    [2] https://github.com/TecharoHQ/anubis/releases/tag/v1.22.0

  • xena 2 days ago

    If you can optimize it, I would love that as a pull request! I am not a JS expert.

  • gruez 2 days ago

    >but the hottest code path seems to be written extremely inefficiently.

    Why is this inefficient?

tptacek 2 days ago

Right, but that's the point. It's not that the idea is bad. It's that PoW is the wrong fit for it. Internet-wide scrapers don't keep state? Ok, then force clients to do something that requires keeping state. You don't need to grind SHA2 puzzles to do that; you don't need to grind anything at all.

[removed] 2 days ago
[deleted]
valicord 2 days ago

The point is that the scrapers can easily bypass this if they cared to do so

  • uqers 2 days ago

    How so?

    • valicord 2 days ago

      The parent comment was "The author of that site assumes that scrapers will keep track of the access tokens for a week, but most internet-wide scrapers don't do so.". There's no technical reason why they wouldn't reuse those tokens, they don't do that today because they don't care. If anubis gets enough adoption to cause meaningful inconvenience, the scrapers would just start caching the tokens to amortize the cost.

      The point of the article is that if the scraper is sufficiently motivated, Anubis is not going to do much anyway, and if the scraper doesn't care, same result can be achieved without annoying your actual users.

    • tecoholic 2 days ago

      Hmm… by setting the verified=1 cookie on every request to the website?

      Am I missing something here? All this does is set an unencrypted cookie and reload the page right?

      • notpushkin 2 days ago

        They could, but if this is slightly different from site to site, they’ll have to either do this for every site (annoying but possible if your site is important enough), or go ahead and run JS (which... I thought they do already, with plenty of sites still being SPAs?)

        • rezonant 2 days ago

          I would be highly surprised if most of these bots are already running JavaScript, I'm confused by this unquestioned notion that they don't.