Comment by johnklos

Comment by johnklos 3 days ago

This is a usually technical crowd, so I can't help but wonder if many people genuinely don't get it, or if they are just feigning a lack of understanding to be dismissive of Anubis.

Sure, the people who make the AI scraper bots are going to figure out how to actually do the work. The point is that they hadn't, and this worked for quite a while.

As the botmakers circumvent, new methods of proof-of-notbot will be made available.

It's really as simple as that. If a new method comes out and your site is safe for a month or two, great! That's better than dealing with fifty requests a second, wondering if you can block whole netblocks, and if so, which.

This is like those simple things on submission forms that ask you what 7 + 2 is. Of course everyone knows that a crawler can calculate that! But it takes a human some time and work to tell the crawler HOW.

palata 3 days ago

> they are just feigning a lack of understanding to be dismissive of Anubis.

I actually find the featured article very interesting. It doesn't feel dismissive of Anubis, but rather it questions whether this particular solution makes sense or not in a constructive way.

Reply View 5 replies

johnklos 3 days ago

I agree - the article is interesting and not dismissive.
I was talking more about some of the people here ;)

Reply View | 4 replies
- dmesg 2 days ago
  
  I still don't understand what Anubis solves if it can be bypassed too easily: If you use User-agent switcher (i emulate wget) as firefox addon on kernel.org or ffmpeg.org you save the entire check time and straight up skip Anubis. Apparently they use a whitelist for user-agents due to allowing legitimate wget usage on these domains. However if I (an honest human can) the scrapers and grifters can too.
  https://addons.mozilla.org/en-US/firefox/addon/uaswitcher/
  If anyone wants to try themselves. This is by no means against Anubis, but raising the question: Can you even protect a domain if you force yourself to whitelist (for a full bypass) easy to guess UAs?
  
  Reply View | 3 replies
  
  hooverd 2 days ago
  
  It's extra work for scrapers. They pretend to be upstanding citizens (Chrome UA from residential IPs). You can more easily block those.
  
  Reply View | 2 replies

technion 3 days ago

It really should be recognised just how many people are watching Cloudflare interstitials on nearly every site these days (and I totally get why this happens) yet making a huge amount of noise about Anubis on a very small amount of sites.

Reply View 87 replies

mlyle 3 days ago

I don't trip over CloudFlare except when in a weird VPN, and then it always gets out of my way after the challenge.
Anubis screws with me a lot, and often doesn't work.

Reply View | 35 replies
- dijit 3 days ago
  
  The annoying thing about cloudflare is that most of the time once you’re blocked: you’re blocked.
  There’s literally no way for you to bypass the block if you’re affected.
  Its incredibly scary, I once had a bad useragent (without knowing it) and half the internet went offline, I couldn’t even access documentation or my email providers site, and there was no contact information or debugging information to help me resolve it: just a big middle finger for half the internet.
  I haven’t had issues with any sites using Anubis (yet), but I suspect there are ways to verify that you’re a human if your browser fails the automatic check at least.
  
  Reply View | 15 replies
  
  zorked 3 days ago
  
  CloudFlare is dystopic. It centralizes even the part of the Internet that hadn't been centralized before. It is a perfect Trojan horse to bypass all encryption. And it chooses who accesses (a considerable chunk of) the Internet and who doesn't.
  Anubis looks much better than this.
  
  Reply View | 3 replies
  
  piltdownman 3 days ago
  
  It could be a lot worse. Soccer rights-holders effectively shut-down the Cloudflare facilitated Internet in Spain during soccer matches to 'curb piracy'.
  The Soccer rightsholders - LaLiga - claim more than 50% of pirate IPs illegally distributing its content are protected by Cloudflare. Many were using an application called DuckVision to facilitate this streaming.
  Telefónica, the ISP, upon realizing they couldn’t directly block DuckVision’s IP or identify its users, decided on a drastic solution: blocking entire IP ranges belonging to Cloudflare, which continues to affect a huge number of services that had nothing to do with soccer piracy.
  https://pabloyglesias.medium.com/telef%C3%B3nicas-cloudflare...
  https://www.broadbandtvnews.com/2025/02/19/cloudflare-takes-...
  https://community.cloudflare.com/t/spain-providers-blocks-cl...
  
  Reply View | 0 replies
  
  ehnto 3 days ago
  
  Now imagine your government provided internet agent gets blacklisted because your linked social media post was interpreted by an LLM to be anti-establishment, and we are painting a picture of our current trajectory.
  
  Reply View | 1 reply
  
  petralithic 3 days ago
  
  I don't have to imagine
  
  Reply View | 0 replies
  
  kedihacker 3 days ago
  
  Anubis checks proof of work so as long as JavaScript runs you will pass it.
  
  Reply View | 0 replies
  
  Dilettante_ 3 days ago
  
  A "digital no-fly-list" is hella cyberpunk, though.
  
  Reply View | 6 replies
- binaryturtle 3 days ago
  
  I'm on an older system here, and both Cloudflare and Anubis entirely block me out of sites. Once you start blocking actual users out of your sites, it simply has gone too far. At least provide an alternative method to enter your site (e.g. via login) that's not hampered by erroneous human checks. Same for the captchas where you help train AIs by choosing out of a set of tiny/ noisy pictures. I often struggle for 5 to 10 minutes to get past that nonsense. I heard bots have less trouble.
  Basically we're already past the point where the web is made for actual humans, now it's made for bots.
  
  Reply View | 8 replies
  
  inejge 3 days ago
  
  > Once you start blocking actual users out of your sites, it simply has gone too far.
  It has, scrapers are out of control. Anubis and its ilk are a desperate measure, and some fallout is expected. And you don't get to dictate how a non-commercial site tries to avoid throttling and/or bandwidth overage bills.
  
  Reply View | 5 replies
  
  alperakgun 3 days ago
  
  I gave up on a lot of websites because of the aggressive blocking.
  
  Reply View | 0 replies
  
  johnklos 3 days ago
  
  FYI - you can communicate with the author of Anubis, who has already said she's working on ways to make sure that all browsers - links, lynx, dillo, midori, et cetera, work.
  Unless you're paying Cloudflare a LOT of money, you won't get to talk with anyone who can or will do anything about issues. They know about their issues and simply don't care.
  If you don't mind taking a few minutes, perhaps put some details about your setup in a bug report?
  
  Reply View | 0 replies
- necovek 3 days ago
  
  It's the other way around for me sometimes — I've never had issue with Anubis, I frequently get it with CF-protected sites.
  (Not to mention all the sites which started putting country restrictions in on their generally useful instruction articles etc — argh)
  
  Reply View | 0 replies
- Pinus 3 days ago
  
  I’m planning a trip to France right now, and it seems like half the websites in that country (for example, ratp.fr for Paris public transport info) require me to check a CloudFlare checkbox to promise that I am a human. And of those that don’t, quite a few just plain lock me out...
  
  Reply View | 4 replies
  
  ta988 3 days ago
  
  And a lot of US sites don't work in France either, or they ban you after just a couple requests with no appeal...
  
  Reply View | 0 replies
  
  Symbiote 3 days ago
  
  I find the same when using some foreign sites. I think the operator must have configured that France is OK, maybe neighboring countries too, the rest of the world must be checked.
  
  Reply View | 0 replies
  
  alibarber 3 days ago
  
  It's not hard to understand why though surely?
  You might have to show a passport when you enter France, and have your baggage and person (intrusively) scanned if you fly there, for much the same reason.
  People, some of them in positions of government in some nation states want to cause harm to the services of other states. Cloudflare was probably the easiest tradeoff for balancing security of the service with accessibility and cost to the French/Parisian taxpayer.
  Not that I'm happy about any of this, but I can understand it.
  
  Reply View | 1 reply
  
  inferiorhuman 3 days ago
  
  The antagonists in this case are not state sponsored terrorists, instead it's AI bros DDoSing the internet.
  
  Reply View | 0 replies
- thayne 3 days ago
  
  I get one basically every time I go to gitlab.com on Firefox.
  It is easy to pass the challange, but it isn't any better than Anubis.
  
  Reply View | 0 replies
- NoGravitas 3 days ago
  
  Even when not on VPN, if a site uses the CloudFlare interstitials, I will get it every single time - at least the "prove you're not a bot" checkbox. I get the full CAPTCHA if I'm on a VPN or I change browsers. It is certainly enough to annoy me. More than Anubis, though I do think Anubis is also annoying, mainly because of being nearly worthless.
  
  Reply View | 0 replies
- immibis 3 days ago
  
  You must be on a good network. You should run one of those "get paid to share your internet connection with AI companies" apps. Since you're on a good network you might make a lot of money. And then your network will get cloudflared, of course.
  We should repeat this until every network is cloudflared and everyone hates cloudflare and cloudflare loses all its customers and goes bankrupt. The internet would be better for it.
  
  Reply View | 0 replies
- wongarsu 3 days ago
  
  For me both are things that mostly show up for 1-3 seconds, then get replaced by the actual website. I suspect that's the user experience of 99% of people.
  If you fall in the other 1% (e.g. due to using unusual browsers or specific IP ranges), cloudflare tends to be much worse
  
  Reply View | 0 replies
elric 3 days ago

I hit Cloudflare's garbage about as much as I hit Anubis. With the difference that far more sites use Cloudflare than Anubis, thus Anubis is far worse at triggering false positives.

Reply View | 15 replies
- Aachen 3 days ago
  
  Huh? What false positives does Anubis produce?
  The article doesn't say and I constantly get the most difficult Google captchas, cloudflare block pages saying "having trouble?" (which is a link to submit a ticket that seems to land in /dev/null), IP blocks because user agent spoofing, errors "unsupported browser" when I don't do user agent spoofing... the only anti-bot thing that reliably works on all my clients is Anubis. I'm really wondering what kinds of false positives you think Anubis has, since (as far as I can tell) it's a completely open and deterministic algorithm that just lets you in if you solve the challenge, and as the author of the article demonstrated with some C code (if you don't want to run the included JavaScript that does it for you), that works even if you are a bot. And afaik that's the point: no heuristics and false positives but a straight game of costs; making bad scraping behavior simply cost more than implementing caching correctly or using commoncrawl
  
  Reply View | 4 replies
  
  jakogut 3 days ago
  
  I've had Anubis repeatedly fail to authorize me to access numerous open source projects, including the mesa3d gitlab, with a message looking something like "you failed".
  As a legitimate open source developer and contributor to buildroot, I've had no recourse besides trying other browsers, networks, and machines, and it's triggered on several combinations.
  
  Reply View | 3 replies
- analbliss 3 days ago
  
  So yes, it is like having a stalker politely open the door for you as you walk into a shop, because they know very well who you are.
  
  Reply View | 9 replies
  
  robertlagrant 3 days ago
  
  In a world full of robots that look like humans, the stalker who knows you and lets you in might be the only solution.
  
  Reply View | 4 replies
  
  rob_c 3 days ago
  
  [flagged]
  
  Reply View | 3 replies
tgv 3 days ago

That says something about the chosen picture, doesn't it? Probably that it's not well liked. It certainly isn't neutral, while the Cloudfare page is.

Reply View | 19 replies
- drakythe 3 days ago
  
  You know, you say that, and while I understand where you're coming from I was browsing the git repo when github had a slight error and I was greeted with an angry pink unicorn. If Github can be fun like that, Anubis can too, I think.
  
  Reply View | 6 replies
  
  MintPaw 3 days ago
  
  Yeah, but do people like that? It feels pretty patronizing to me in a similar way. Like "Weee! So cute that our website is broken, good luck doing your job! <3"
  Reminds me of the old uwu error message meme.
  
  Reply View | 3 replies
  
  [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
  
  tgv 3 days ago
  
  I don't think you want to suggest that everyone must like it?
  
  Reply View | 0 replies
- thrance 3 days ago
  
  Anubis was originally an open source project built for a personnal blog. It gained traction but the anime girl remained so that people are reminded of the nature of the project. Comparing it with Cloudflare is truly absurd. That said, a paid version is available with guard page customization.
  
  Reply View | 0 replies
- troyvit 3 days ago
  
  Nothing says, "Change out the logo for something that doesn't make my clients tingle in an uncomfortable way" like the MIT license.
  
  Reply View | 10 replies
  
  integralid 2 days ago
  
  I wonder why the anime girl is received so badly. Is it because it's seen as childish? Is it bad because it confuses people (i.e. don't do this because other don't do this)?
  Thinking about it logically, putting some "serious" banner there would just make everything a bit more grey and boring and would make no functional difference. So why is it disliked so much?
  
  Reply View | 7 replies
  
  notpushkin 3 days ago
  
  Keep in mind that the author explicitly asks you not to do this, and offers a paid white label version. You can still do it yourself, but maybe you shouldn’t.
  
  Reply View | 1 reply
  
  troyvit 2 days ago
  
  That's a good point and I didn't know that.
  
  Reply View | 0 replies
jcelerier 3 days ago

Both are equally terrible - one doesn't require explanations to my boss though

Reply View | 9 replies
- Aachen 3 days ago
  
  If your boss doesn't want you to browse the web, where some technical content is accompanied by an avatar that the author likes, they may not be suitable as boss, or at least not for positions where it's their job to look over your shoulder and make sure you're not watching series during work time. Seems like a weird employment place if they need to check that anyway
  
  Reply View | 6 replies
  
  jcelerier 3 days ago
  
  we have customers in our offices pretty much every day, I think "no anime girls on screens" is a fair request
  
  Reply View | 5 replies
- ChocolateGod 3 days ago
  
  If Anubis didn't ship with a weird looking anime girl I think people would treat it akin to Cloudflares block pages.
  
  Reply View | 1 reply
  
  autoexec 2 days ago
  
  Which means they'd still hate it and find it annoying
  
  Reply View | 0 replies
petralithic 3 days ago

We can make noise about both things, and how they're ruining the internet.

Reply View | 0 replies
account42 3 days ago

Cloudflare's solution works without javascript enabled unless the website turns up the scare level to max or you are on an IP with already bad reputation. Anubis does not.
But at the end of the day both are shit and we should not accept either. That includes not using one as an excuse for the other.

Reply View | 1 reply
- superkuh 3 days ago
  
  Laughable. They say this but anyone who actually surfs the web with a non-bleeding edge non-corporate browser gets constantly blocked by Cloudflare. The idea that their JS computational paywalls only pop up rarely is absurd. Anyone believing this line lacks lived experience. My Comcast IP shouldn't have a bad rep and using a browser from ~2015 shouldn't make me scary. But I can't even read bills on congress.gov anymore thanks to bad CF deployals.
  Also, Anubis does have a non-JS mode: the HTML header meta-refresh based challenge. It's just that the type of people who use Cloudflare or Anubis almost always just deploy the default (mostly broken) configs that block as many human people as bots. And they never realize it because they only measure such things with javascript.
  
  Reply View | 0 replies
lupusreal 3 days ago

Over the past few years I've read far more comments complaining about Cloudflare doing it than Anubis. In fact, this discussion section is the first time I've seen people talking about Anubis.

Reply View | 0 replies
ronsor 3 days ago

TO BE FAIR
I dislike those even more.

Reply View | 0 replies

agwa 3 days ago

It sounds like you're saying that it's not the proof-of-work that's stopping AI scrapers, but the fact that Anubis imposes an unusual flow to load the site.

If that's true Anubis should just remove the proof-of-work part, so legitimate human visitors don't have to stare at a loading screen for several seconds while their device wastes electricity.

Reply View 21 replies

chrismorgan 3 days ago

> If that's true Anubis should just remove the proof-of-work part
This is my very strong belief. To make it even clearer how absurd the present situation is, every single one of the proof-of-work systems I’ve looked at has been using SHA-256, which is basically the worst choice possible.
Proof-of-work is bad rate limiting which depends on a level playing field between real users and attackers. This is already a doomed endeavour. Using SHA-256 just makes it more obvious: there’s an asymmetry factor in the order of tens of thousands between common real-user hardware and software, and pretty easy attacker hardware and software. You cannot bridge such a divide. If you allow the attacker to augment it with a Bitcoin mining rig, the efficiency disparity factor can go up to tens of millions.
These proof-of-work systems are only working because attackers haven’t tried yet. And as long as attackers aren’t trying, you can settle for something much simpler and more transparent.
If they were serious about the proof-of-work being the defence, they’d at least have started with something like Argon2d.

Reply View | 8 replies
- voidnap 3 days ago
  
  The proof of work isn't really the crux. They've been pretty clear about this from the beginning.
  I'll just quote from their blog post from January.
  https://xeiaso.net/blog/2025/anubis/
  Anubis also relies on modern web browser features:
  - ES6 modules to load the client-side code and the proof-of-work challenge code.
  - Web Workers to run the proof-of-work challenge in a separate thread to avoid blocking the UI thread.
  - Fetch API to communicate with the Anubis server.
  - Web Cryptography API to generate the proof-of-work challenge.
  This ensures that browsers are decently modern in order to combat most known scrapers. It's not perfect, but it's a good start.
  This will also lock out users who have JavaScript disabled, prevent your server from being indexed in search engines, require users to have HTTP cookies enabled, and require users to spend time solving the proof-of-work challenge.
  This does mean that users using text-only browsers or older machines where they are unable to update their browser will be locked out of services protected by Anubis. This is a tradeoff that I am not happy about, but it is the world we live in now.
  
  Reply View | 3 replies
  
  account42 3 days ago
  
  Except this is exactly the problem. Now you are checking for mainstream browsers instead of some notion of legitimate users. And as TFA shows a motivated attacker can bypass all of that while legitimate users of non-mainstream browsers are blocked.
  
  Reply View | 0 replies
  
  mewpmewp2 3 days ago
  
  Aren't most scrapers using things like Playright or Puppeteer anyway by now, especially since so many pages are rendered using JS and even without Anubis would be unreadable without executing modern JS?
  
  Reply View | 0 replies
  
  rfoo 3 days ago
  
  ... except when you do not crawl with a browser at all. It's so trivial to solve just like the taviso post demostrated.
  This makes zero sense, this is simply the wrong approach. Already tired of saying so and been attacked. So I'm glad professional-random-Internet-bullshit-ignorer Tavis Ormandy wrote this one.
  
  Reply View | 0 replies
- username332211 3 days ago
  
  All this is true, but also somewhat irrelevant. In reality the amount of actual hash work is completely negligible.
  For usability reasons Anubus only requires that you to go trough a the proof of work flow only once in a given period. (I think the default is once per week.) That's just very little work.
  Detecting you need to occasionally send a request trough a headless browser far more of a hassle than the PoW. If you prefer LLMs rather than normal internet search, it'll probably consume far more compute as well.
  
  Reply View | 3 replies
  
  rendx 3 days ago
  
  > For usability reasons Anubus only requires that you to go trough a the proof of work flow only once in a given period. (I think the default is once per week.) That's just very little work.
  If you keep cookies. I do not want to keep cookies for otherwise "stateless" sites. I have maybe a dozen sites whitelisted, every other site loses cookies when I close the tab.
  
  Reply View | 2 replies
kaszanka 3 days ago

This is basically what most of the challenge types in go-away (https://git.gammaspectra.live/git/go-away/wiki/Challenges) do.

Reply View | 1 reply
- Tmpod 3 days ago
  
  +1 for go-away. It's a bit more involved to configure, but worth the effort imo. It can be considerably more transparent to the user, triggering the nuclear PoW check less often, while being just as effective, in my experience.
  
  Reply View | 0 replies
amarant 3 days ago

I feel like the future will have this, plus ads displayed while the work is done, so websites can profit while they profit.

Reply View | 7 replies
- silversmith 3 days ago
  
  Every now and then I consider stepping away from the computer job, and becoming a lumberjack. This is one of those moments.
  
  Reply View | 3 replies
  
  jones89176 3 days ago
  
  my family takes care of a large-ish forest, so I have to help since my early teens. Let me tell you: think twice, it's f*ckin dangerous. Chainsaws, winches, heavy trees falling and breaking in unpredictable ways. I had a couple of close calls myself. Recently a guy from a neighbor village was squashed to death by a root plate that tilted.
  I often think about quitting tech myself, but becoming a full-time lumberjack is certainly not an alternative for me.
  
  Reply View | 1 reply
  
  silversmith 2 days ago
  
  Hah, I know, been around forests since childhood, seen (and done) plenty of sketchy stuff. For me it averages out to couple days of forest work a year. It's backbreaking labour, and then you deal with the weather.
  But man, if tech goes straight into cyberpunk dystopia but without the cool gadgets, maybe it is the better alternative.
  
  Reply View | 0 replies
  
  zxexz 3 days ago
  
  Worth getting to know the in and outs of forest management now. I don’t think AI will take most tech jobs soon, but they sure as hell are already making them boring.
  
  Reply View | 0 replies
- JimDabell 3 days ago
  
  adCAPTCHA already does this:
  https://adcaptcha.com
  
  Reply View | 2 replies
  
  Tmpod 3 days ago
  
  This is a joke, right? The landing page makes it seem so.
  I tried the captcha in their login page and it made the entire page, including the puzzle piece slider, run at 2 fps.
  My god, we do really live in 2025.
  
  Reply View | 0 replies
  
  Aachen 3 days ago
  
  Holy shit. Opening the demo from the menu, it's like captchas and youtube ads had a baby
  
  Reply View | 0 replies
tptacek 3 days ago

Exactly this.

Reply View | 0 replies
empath75 3 days ago

I don't think anything will stop AI companies for long. They can do spot AI agentic checks of workflows that stop working for some reason and the AI can usually figure out what the problem is and then update the workflow to get around it.

Reply View | 0 replies

hedora 3 days ago

This was obviously dumb when it launched:

1) scrapers just run a full browser and wait for the page to stabilize. They did this before this thing launched, so it probably never worked.

2) The AI reading the page needs something like 5 seconds * 1600W to process it. Assuming my phone can even perform that much compute as efficiently as a server class machine, it’d take a large multiple of five seconds to do it, and get stupid hot in the process.

Note that (2) holds even if the AI is doing something smart like batch processing 10-ish articles at once.

Reply View 16 replies

pilif 3 days ago

> This was obviously dumb when it launched:
Yes. Obviously dumb but also nearly 100% successful at the current point in time.
And likely going to stay successful as the non-protected internet still provides enough information to dumb crawlers that it’s not financially worth it to even vibe-code a workaround.
Or in other words: Anubis may be dumb, but the average crawler that completely exhausting some sites resources is even dumber.
And so it all works out.
And so the question remains: how dumb was it exactly, when it works so well and continues to work so well?

Reply View | 9 replies
- account42 3 days ago
  
  > Yes. Obviously dumb but also nearly 100% successful at the current point in time.
  Only if you don't care about negatively affecting real users.
  
  Reply View | 1 reply
  
  pilif 3 days ago
  
  I understand this as an argument that it’s better to be down for everyone than have a minority of users switch browsers.
  I’m not convinced by that makes sense.
  Now ideally you would have the resources to serve all users and all the AI bots without performance degradation, but for some projects that’s not feasible.
  In the end it’s all a compromise.
  
  Reply View | 0 replies
- kldg 3 days ago
  
  does it work well? I run chromium controlled by playwright for scraping and typically make Gemini implement the script for it because it's not worth my time otherwise. -but I'm not crawling the Internet generally (which I think there is very little financial incentive to do; it's a very expensive process even ignoring Anubis et al); it's always that I want something specific and am sufficiently annoyed by lack of API.
  regarding authentication mentioned elsewhere, passing cookies is no big deal.
  
  Reply View | 1 reply
  
  eaglefield 3 days ago
  
  Anubis is not meant to stop single endpoints from scraping. It's meant to make it harder for massive AI scrapers. The problematic ones evade rate limiting by using many different ip addresses, and make scraping cheaper on themselves by running headless. Anubis is specifically built to make that kind of scraping harder as i understand it.
  
  Reply View | 0 replies
- bananalychee 3 days ago
  
  Does it actually? I don't think I've seen a case study with hard numbers.
  
  Reply View | 3 replies
  
  pilif 3 days ago
  
  Here’s one study
  https://dukespace.lib.duke.edu/server/api/core/bitstreams/81...
  And of all the high-profile projects implementing it, like the LKML archives, none have backed down yet, so I’m assuming the initial improvement in numbers must continue or it would have been removed since
  
  Reply View | 2 replies
- snickerdoodle12 3 days ago
  
  the workaround is literally just running a headless browser, and that's pretty much the default nowadays.
  if you want to save some $$$ you can spend like 30 minutes making a cracker like in the article. just make it multi threaded, add a queue and boom, your scraper nodes can go back to their cheap configuration. or since these are AI orgs we're talking about, write a gpu cracker and laugh as it solves challenges far faster than any user could.
  custom solutions aren't worth it for individual sites, but with how widespread anubis is it's become worth it.
  
  Reply View | 0 replies
pama 3 days ago

I agree. Your estimate for (2), about 0.0022 kWh, corresponds to about a sixth of the charge of an iPhone 15 pro and would take longer than ten minutes on the phone, even at max power draw. It feels about right for the amount of energy/compute of a large modern MoE loading large pages of several 10k tokens. For example this tech (couple month old) could input 52.3k tokens per second to a 672B parameter model, per H100 node instance, which probably burns about 6–8kW while doing it. The new B200s should be about 2x to 3x more energy efficient, but your point still holds within an order of magnitude.
https://lmsys.org/blog/2025-05-05-large-scale-ep/

Reply View | 0 replies
rob_c 3 days ago

The argument doesn't quite hold. The mass scraping (for training) is almost never doing by a GPU system it's almost always done by a dedicated system running a full chrome fork in some automated way (not just the signatures but some bugs give that away).
And frankly processing a single page of text is run within a single token window so likely is run for a blink (ms) before moving onto the next data entry. The kicker is it's run over potentially thousands of times depending on your training strategy.
At inference there's now a dedicated tool that may perform a "live" request to scrape the site contents. But then this is just pushed into a massive context window to give the next token anyway.

Reply View | 4 replies
- account42 3 days ago
  
  The point is that scraping is already inherently cost-intensive so a small additional cost from having to solve a challenge is not going to make a dent in the equation. It doesn't matter what server is doing what for that.
  
  Reply View | 3 replies
  
  mistercheph 3 days ago
  
  100 billion web pages * 0.02 USD of PoW/page = 2 billion dollars, the point is not to stop every scraper/crawler, the point is to raise the costs enough to avoid being bombarded by all of them
  
  Reply View | 2 replies

psionides 3 days ago

The problem is that 7 + 2 on a submission form only affects people who want to submit something, Anubis affects every user who wants to read something on your site

Reply View 2 replies

account42 3 days ago

The question then is why read only users are consuming so much resources that serving them big chunks of JS instead reduces loads of the server. Maybe improve you rendering and/or caching before employing DRM solutions that are doomed to fail anyway.

Reply View | 1 reply
- Mateon1 3 days ago
  
  The problem it's originally fixing is bad scrapers accessing dynamic site content that's expensive to produce, like trying to crawl all diffs in a git repo, or all mediawiki oldids. Now it's also used on mostly static content because it is effective vs scrapers that otherwise ignore robots.txt.
  
  Reply View | 0 replies

monooso 3 days ago

The author make it very clear that he understands the problem Anubis is attempting to solve. His issue is that the chosen approach doesn't solve that problem; it just inhibits access to humans, particularly those with limited access to compute resources.

That's the opposite of being dismissive. The author has taken the time to deeply understand both the problem and the proposed solution, and has taken the time to construct a well-researched and well-considered argument.