Why are anime catgirls blocking my access to the Linux kernel?

(lock.cmpxchg8b.com)

815 points by taviso 4 days ago

eqvinox 3 days ago

TFA — and most comments here — seem to completely miss what I thought was the main point of Anubis: it counters the crawler's "identity scattering"/sybil'ing/parallel crawling.

Any access will fall into either of the following categories:

- client with JS and cookies. In this case the server now has an identity to apply rate limiting to, from the cookie. Humans should never hit it, but crawlers will be slowed down immensely or ejected. Of course the identity can be rotated — at the cost of solving the puzzle again.

- amnesiac (no cookies) clients with JS. Each access is now expensive.

(- no JS - no access.)

The point is to prevent parallel crawling and overloading the server. Crawlers can still start an arbitrary number of parallel crawls, but each one costs to start and needs to stay below some rate limit. Previously, the server would collapse under thousands of crawler requests per second. That is what Anubis is making prohibitively expensive.

Reply View 9 replies

qwery 3 days ago

Yes, I think you're right. The commentary about its (presumed, imagined) effectiveness is very much making the assumption that it's designed to be an impenetrable wall[0] -- i.e. prevent bots from accessing the content entirely.
I think TFA is generally quite good and has something of a good point about the economics of the situation, but finding the math shake out that way should, perhaps, lead one to question their starting point / assumptions[1].
In other words, who said the websites in question wanted to entirely prevent crawlers from accessing them? The answer is: no one. Web crawlers are and have been fundamental to accessing the web for decades. So why are we talking about trying to do that?
[0] Mentioning 'impenetrable wall' is probably setting off alarm bells, because of course that would be a bad design.
[1] (Edited to add:) I should say 'to question their assumptions more' -- like I said, the article is quite good and it does present this as confusing, at least.

Reply View | 2 replies
- 1gn15 2 days ago
  
  > In other words, who said the websites in question wanted to entirely prevent crawlers from accessing them? The answer is: no one. Web crawlers are and have been fundamental to accessing the web for decades. So why are we talking about trying to do that?
  I agree, but the advertising is the whole issue. "Checking to see you're not a bot!" and all that.
  Therefore some people using Anubis expect it to be an impenetrable wall, to "block AI scrapers", especially those that believe it's a way for them to be excluded from training data.
  It's why just a few days ago there was a HN frontpage post of someone complaining that "AI scrapers have learnt to get past Anubis".
  But that is a fight that one will never win (analog hole as the nuclear option).
  If it said something like "Wait 5 seconds, our servers are busy!", I would think that people's expectations will be more accurate.
  As a robot I'm really not that sympathetic to anti-bot language backfiring on humans. I have to look away every time it comes up on my screen. If they changed their language and advertising, I'll be more sympathetic -- it's not as if I disagree that overloading servers for not much benefit is bad!
  
  Reply View | 1 reply
  
  qwery a day ago
  
  Yeah, I think it's obviously a pretty natural conclusion to draw, that {thing for hinder crawler} ≅≅ {thing for stop all crawler}. Perhaps I should have stated that explicitly in the original comment.
  As for the presentation/advertising, I didn't get into it because I don't hold a particularly strong opinion. Well, I do hold a particularly strong opinion, but not one that really distinguishes Anubis from any of the other things. I'm fully onboard with what you're saying -- I find this sort of software extremely hostile and the fact that so many people don't[0] reminds me that I'm not a people.
  In my experience, this particular jump scare is about the same as any of the other services. The website is telling me that I'm not welcome for whatever arbitrary reason it is now, and everyone involved wants me to feel bad.
  Actually there is one thing I like about the Anubis experience[1] compared to the other ones, it doesn't "Would you like to play a game?" me. As a robot I appreciate the bluntness, I guess.
  (the games being: "click on this. now watch spinny. more. more. aw, you lose! try again?", and "wheel, traffic light, wildcard/indistinguishable"[2]).
  [0] "just ignore it, that's what I do" they say. "Oh, I don't have a problem like that. Sucks to be you."
  [1] yes, I'm talking upsides about the experience of getting **ed by it. I would ask how we got here but it's actually pretty easy to follow.
  [2] GCHQ et al. should provide a meatspace operator verification service where they just dump CCTV clips and you have to "click on the squares that contain: UNATTENDED BAG". Call it "phonebooth, handbag, foreign agent".
  (Apologies for all the weird tangents -- I'm just entertaining myself, I think I might be tired.)
  
  Reply View | 0 replies
thayne 3 days ago

You don't necessarily need JS, you just need something that can detect if Anybis is used and complete the challenge.

Reply View | 3 replies
- eqvinox 3 days ago
  
  Sure, doesn't change anything though; you still need to spend energy on a bunch of hash calculations.
  
  Reply View | 0 replies
- rocqua 3 days ago
  
  But then you rate limit that challenge.
  You could setup a system for parellelizing the creation of these Anubis PoW cookies independent of the crawling logic. That would probably work, but it's a pretty heavy lift compared to 'just run a browser with JavaScript'.
  
  Reply View | 0 replies
- [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
rocqua 3 days ago

This is a good point, presuming the rate limiting is actually applied.

Reply View | 0 replies
IshKebab 3 days ago

Well maybe, but even then, how many parallel crawls are you going to do per site? 100 maybe? You can still get enough keys to do that for all sites in just a few hours per week.

Reply View | 0 replies

wraptile 3 days ago

I'm a scraper developer and Anubis would have worked 10 - 20 years ago, but now all broad scrapers run on a real headless browser with full cookie support and costs relatively nothing in compute. I'd be surprised if LLM bots would use anything else given the fact that they have all of this compute and engineers already available.

That being said, one point is very correct here - by far the best effort to resist broad crawlers is a _custom_ anti-bot that could be as simple as "click your mouse 3 times" because handling something custom is very difficult in broad scale. It took the author just few minutes to solve this but for someone like Perplexity it would take hours of engineering and maintenance to implement a solution for each custom implementation which is likely just not worth it.

You can actually see this in real life if you google web scraping services and which targets they claim to bypass - all of them bypass generic anti-bots like Cloudflare, Akamai etc. but struggle with custom and rare stuff like Chinese websites or small forums because scraping market is a market like any other and high value problems are solved first. So becoming a low value problem is a very easy way to avoid confrontation.

Reply View 18 replies

jandrese 3 days ago

> That being said, one point is very correct here - by far the best effort to resist broad crawlers is a _custom_ anti-bot that could be as simple as "click your mouse 3 times" because handling something custom is very difficult in broad scale.
Isn't this what Microsoft is trying to do with their sliding puzzle piece and choose the closest match type systems?
Also, if you come in on a mobile browser it could ask you to lay your phone flat and then shake it up and down for a second or something similar that would be a challenge for a datacenter bot pretending to be a phone.

Reply View | 0 replies
DanielHB 3 days ago

How do you bypass cloudflare? I do some light scrapping for some personal stuff, but I can't figure out how to bypass it. Like do you randomize IPs using several VPNs at the same time?
I usually just sit there on my phone pressing the "I am not a robot box" when it triggers.

Reply View | 7 replies
- wraptile 2 days ago
  
  It's still pretty hard to bypass it with open source solutions. To bypass CF you need:
  - an automated browser that doesn't leak the fact it's being automated
  - ability to fake the browser fingerprint (e.g. Linux is heavily penalized)
  - residential or mobile proxies (for small scale your home IP is probably good enough)
  - deployment environment that isn't leaked to the browser.
  - realistic scrape pattern and header configuration (header order, referer, prewalk some pages with cookies etc.)
  This is really hard to do at scale but for small personal scripts you can have reasonable results with flavor of the month playwright forks on github like nodriver or dedicated tools like Flaresolver but I'd just find a web scraping api with low entry price and just drop 15$ month and avoid this chase because it can be really time consuming.
  If you're really on budget - most of them offer 1,000 credits for free which will get you avg 100 pages a month per service and you can get 10 of them as they all mostly function the same.
  
  Reply View | 0 replies
- hinach4n 3 days ago
  
  I believe usually you would bypass by using residential ips / proxies?
  
  Reply View | 2 replies
  
  DanielHB 3 days ago
  
  I run it through my home network and I'm still triggering it. I add 2s delays between page load and it still triggers
  
  Reply View | 1 reply
  
  jijijijij 2 days ago
  
  Well, if that's true... I am so sorry to tell you this, it looks like you are in fact a robot.
  
  Reply View | 0 replies
- 1gn15 2 days ago
  
  I use Camoufox for the browser and "playwright-captcha" for the CAPTCHA solving action. It's not fully reliable but it works.
  
  Reply View | 0 replies
- Gander5739 2 days ago
  
  Flaresolverr can bypass it.
  
  Reply View | 0 replies
- buckle8017 3 days ago
  
  Ironically by runnung cloudflare warp.
  
  Reply View | 0 replies
miki123211 3 days ago

This only works if you're a low-value site (which admittedly most sites are).

Reply View | 0 replies
hahn-kev 3 days ago

Bot blocking through obscurity

Reply View | 3 replies
- lbhdc 3 days ago
  
  That's really the only option available here, right? The goal is to keep sites low friction for end users while stopping bots. Requiring an account with some moderation would stop the majority of bots, but it would add a lot of friction for your human users.
  
  Reply View | 1 reply
  
  brookst 3 days ago
  
  The other option is proof of work. Make clients use JS to do expensive calculations that aren’t a big deal for single clients, but get expensive at scale. Not ideal, but another tool to potentially use.
  
  Reply View | 0 replies
- tovej 3 days ago
  
  I like it, make the bot developers play whack-a-mole.
  Of course, you're going to have to verify each custom puzzle aren't you.
  
  Reply View | 0 replies
sam0x17 3 days ago

> It took the author just few minutes to solve this but for someone like Perplexity it would take hours of engineering and maintenance to implement a solution for each custom implementation which is likely just not worth it.
These are trivial for an AI agent to solve though, even with very dumb watered down models.

Reply View | 0 replies
andai 3 days ago

You can also generate custom solutions at scale with LLMs. So each user could get a different CAPTCHA.

Reply View | 2 replies
- josh-sematic 3 days ago
  
  At that point you’re probably spending more money blocking the scrapers than you would spend just letting them through.
  
  Reply View | 1 reply
  
  lbhdc 3 days ago
  
  That seems like it would make bot blocking saas (like cloudflare or tollbit) more attractive because it could amortize that effort/cost across many clients.
  
  Reply View | 0 replies

Arnavion 3 days ago

>This dance to get access is just a minor annoyance for me, but I question how it proves I’m not a bot. These steps can be trivially and cheaply automated.

>I think the end result is just an internet resource I need is a little harder to access, and we have to waste a small amount of energy.

No need to mimic the actual challenge process. Just change your user agent to not have "Mozilla" in it; Anubis only serves you the challenge if it has that. For myself I just made a sideloaded browser extension to override the UA header for the handful of websites I visit that use Anubis, including those two kernel.org domains.

(Why do I do it? For most of them I don't enable JS or cookies for so the challenge wouldn't pass anyway. For the ones that I do enable JS or cookies for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)

Reply View 62 replies

johnecheck 3 days ago

Sadly, touching the user-agent header more or less instantly makes you uniquely identifiable.
Browser fingerprinting works best against people with unique headers. There's probably millions of people using an untouched safari on iPhone. Once you touch your user-agent header, you're likely the only person in the world with that fingerprint.

Reply View | 31 replies
- sillywabbit 3 days ago
  
  If someone's out to uniquely identify your activity on the internet, your User-Agent string is going to be the least of your problems.
  
  Reply View | 5 replies
  
  _def 3 days ago
  
  Not sure what you mean, as exactly this is happening currently on 99% of the web. Brought to you by: ads
  
  Reply View | 4 replies
- Arnavion 3 days ago
  
  UA fingerprinting isn't a problem for me. As I said I only modify the UA for the handful of sites that use Anubis that I visit. I trust those sites enough that them fingerprinting me is unlikely, and won't be a problem even if they did.
  
  Reply View | 0 replies
- NoMoreNicksLeft 3 days ago
  
  I'll set mine to "null" if the rest of you will set yours...
  
  Reply View | 3 replies
  
  gabeio 3 days ago
  
  The string “null” or actually null? I have recently seen a huge amount of bot traffic which has actually no UA and just outright block it. It’s almost entirely (microsoft cloud) Azure script attacks.
  
  Reply View | 2 replies
- codedokode 3 days ago
  
  If your headers are new every time then it is very difficult to figure out who is who.
  
  Reply View | 5 replies
  
  spoaceman7777 3 days ago
  
  yes, but it puts you in the incredibly small bucket of "users that has weird headers that don't mesh well", and makes using the rest of the (many) other fingerprinting techniques all the more accurate.
  
  Reply View | 0 replies
  
  JoshTriplett 3 days ago
  
  > If your headers are new every time then it is very difficult to figure out who is who.
  https://xkcd.com/1105/
  
  Reply View | 0 replies
  
  kelseydh 3 days ago
  
  It is very easy unless the IP address is also switching up.
  
  Reply View | 0 replies
  
  heavyset_go 3 days ago
  
  It's very easy to train a model to identify anomalies like that.
  
  Reply View | 1 reply
  
  johnecheck 2 days ago
  
  While it's definitely possible to train a model for that, 'very easy' is nonsense.
  Unless you've got some superintelligence hidden somewhere, you'd choose a neural net. To train, you need a large supply of LABELED data. Seems like a challenge to build that dataset; after all, we have no scalable method for classifying as of yet.
  
  Reply View | 0 replies
- andrewmcwatters 3 days ago
  
  Yes, but you can take the bet, and win more often than not, that your adversary is most likely not tracking visitor probabilities if you can detect that they aren't using a major fingerprinting provider.
  
  Reply View | 0 replies
- [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
- jagged-chisel 3 days ago
  
  I wouldn’t think the intention is to s/Mozilla// but to select another well-known UA string.
  
  Reply View | 10 replies
  
  Arnavion 3 days ago
  
  The string I use in my extension is "anubis is crap". I took it from a different FF extension that had been posted in a /g/ thread about Anubis, which is where I got the idea from in the first place. I don't use other people's extensions if I can help it (because of the obvious risk), but I figured I'd use the same string in my own extension so as to be combined with users of that extension for the sake of user-agent statistics.
  
  Reply View | 5 replies
  
  soulofmischief 3 days ago
  
  The UA will be compared to other data points such as screen resolution, fonts, plugins, etc. which means that you are definitely more identifiable if you change just the UA vs changing your entire browser or operating system.
  
  Reply View | 0 replies
  
  throwawayffffas 3 days ago
  
  I don't think there are any.
  Because servers would serve different content based on user agent virtually all browsers start with Mozilla/5.0...
  
  Reply View | 2 replies
- [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
Animats 3 days ago

> (Why do I do it? For most of them I don't enable JS so the challenge wouldn't pass anyway. For the ones that I do enable JS for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)
Hm. If your site is "sticky", can it mine Monero or something in the background?
We need a browser warning: "This site is using your computer heavily in a background task. Do you want to stop that?"

Reply View | 2 replies
- mikestew 3 days ago
  
  We need a browser warning: "This site is using your computer heavily in a background task. Do you want to stop that?"
  Doesn't Safari sort of already do that? "This tab is using significant power", or summat? I know I've seen that message, I just don't have a good repro.
  
  Reply View | 1 reply
  
  qualeed 3 days ago
  
  Edge does, as well. It drops a warning in the middle of the screen, displays the resource-hogging tab, and asks whether you want to force-close the tab or wait.
  
  Reply View | 0 replies
zahlman 3 days ago

> Just change your user agent to not have "Mozilla" in it. Anubis only serves you the challenge if you have that.
Won't that break many other things? My understanding was that basically everyone's user-agent string nowadays is packed with a full suite of standard lies.

Reply View | 4 replies
- Arnavion 3 days ago
  
  It doesn't break the two kernel.org domains that the article is about, nor any of the others I use. At least not in a way that I noticed.
  
  Reply View | 0 replies
- throwawayffffas 3 days ago
  
  In 2025 I think most of the web has moved on from checking user strings. Your bank might still do it but they won't be running Anubis.
  
  Reply View | 2 replies
  
  Aachen 3 days ago
  
  Nope, they're on cloudflare so that all my banking traffic can be intercepted by a foreign company I have no relation to. The web is really headed in a great direction :)
  
  Reply View | 0 replies
  
  account42 3 days ago
  
  The web as a whole definitely has not moved on from that.
  
  Reply View | 0 replies
msephton 3 days ago

I'm interested in your extension. I'm wondering if I could do something similar to force text encoding of pages into Japanese.

Reply View | 2 replies
- Arnavion 2 days ago
  
  If your Firefox supports sideloading extensions then making extensions that modify request or response headers is easy.
  All the API is documented in https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web... . My Anubis extension modifies request headers using `browser.webRequest.onBeforeSendHeaders.addListener()` . Your case sounds like modifying response headers which is `browser.webRequest.onHeadersReceived.addListener()` . Either way the API is all documented there, as is the `manifest.json` that you'll need to write to register this JS code as a background script and whatever permissions you need.
  Then zip the manifest and the script together, rename the zip file to "<id_in_manifest>.xpi", place it in the sideloaded extensions directory (depends on distro, eg /usr/lib/firefox/browser/extensions), restart firefox and it should show up. If you need to debug it, you can use the about:debugging#/runtime/this-firefox page to launch a devtools window connected to the background script.
  
  Reply View | 1 reply
  
  msephton 2 days ago
  
  Cheers! I'm in Safari so I'll see if there's a match.
  
  Reply View | 0 replies
semiquaver 3 days ago

Doesn’t that just mean the AI bots can do the same? So what’s the point?

Reply View | 0 replies
danieltanfh95 3 days ago

wtf? how is this then better than a captcha or something similar?!

Reply View | 0 replies
throw84a747b4 3 days ago

[flagged]

Reply View | 16 replies
- gruez 3 days ago
  
  >Not only is Anubis a poorly thought out solution from an AI sympathizer [...]
  But the project description describes it as a project to stop AI crawlers?
  > Weighs the soul of incoming HTTP requests to stop AI crawlers
  
  Reply View | 14 replies
  
  throw84a747b4 3 days ago
  
  Why would a company that wants to stop AI crawlers give talks on LLMs and diffusion models at AI conferences?
  Why would they use AI art for the first Anubis mascot until GitHub users called out the hypocrisy on the issue tracker?
  Why would they use Stable Diffusion art in their blogposts until Mastodon and Bluesky users called them out on it?
  
  Reply View | 12 replies
  
  account42 3 days ago
  
  AI companies are just as interested in stopping competing crawlers as anyone else.
  
  Reply View | 0 replies
- [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies

ksymph 4 days ago

This is neither here nor there but the character isn't a cat. It's in the name, Anubis, who is an Egyptian deity typically depicted as a jackal or generic canine, and the gatekeeper of the afterlife who weighs the souls of the dead (hence the tagline). So more of a dog-girl, or jackal-girl if you want to be technical.

Reply View 32 replies

esperent 3 days ago

Every representation I've ever seen of Anubis - including remarkably well preserved statues from antiquity - are either a male human body with a canine head, or fully canine.
This anime girl is not Anubis. It's a modern cartoon characters that simply borrows the name because it sounds cool, without caring anything about the history or meaning behind it.
Anime culture does this all the time, drawing on inspiration from all cultures but nearly always only paying the barest lip service to the original meaning.
I don't have an issue with that, personally. All cultures and religions should be fair game as inspiration for any kind of art. But I do have an issue with claiming that the newly inspired creation is equivalent in any way to the original source just because they share a name and some other very superficial characteristics.

Reply View | 14 replies
- account42 3 days ago
  
  It's also that the anime style already makes all heads shaped vaguely like felines. Add upwards pointing furry ears and it's not wrong to call it a cat girl.
  
  Reply View | 0 replies
- ksymph 3 days ago
  
  > they share a name and some other very superficial characteristics.
  I wasn't implying anything more than that, although now I see the confusing wording in my original comment. All I meant to say was that between the name and appearance it's clear the mascot is canid rather than feline. Not that the anime girl with dog ears is an accurate representation of the Egyptian deity haha.
  
  Reply View | 0 replies
- SnuffBox 2 days ago
  
  It's refreshing to see a reply as thought out as this in today's day and and age of "move fast and post garbage".
  
  Reply View | 0 replies
- qwery 3 days ago
  
  I think you're taking it a bit too seriously. In turn, I am, of course, also taking it too seriously.
  > I do have an issue with claiming that the newly inspired creation is equivalent in any way to the original source
  Nobody is claiming that the drawing is Anubis or even a depiction of Anubis, like the statues etc. you are interested in. It's a mascot. "Mascot design by CELPHASE" -- it says, in the screenshot.
  Generally speaking -- I can't say that this is what happened with this project -- you would commission someone to draw or otherwise create a mascot character for something after the primary ideation phase of the something. This Anubis-inspired mascot is, presumably, Anubis-inspired because the project is called Anubis, which is a name with fairly obvious connections to and an understanding of "the original source".
  > Anime culture does this all the time, ...
  I don't know what bone you're picking here. This seems like a weird thing to say. I mean, what anime culture? It's a drawing on a website. Yes, I can see the manga/anime influence -- it's a very popular, mainstream artform around the world.
  
  Reply View | 10 replies
  
  esperent 3 days ago
  
  I like to talk seriously about art, representation, and culture. What's wrong with that? It's at least as interesting as discussing databases or web frameworks.
  In case you feel it needs linking to the purpose of this forum, the art in question here is being forcefully shown to people in a situation that makes them do a massive context switch. I want to look at the linux or ffmpeg source code but my browser failed a security check and now I'm staring at a random anime girl instead. What's the meaning here, what's the purpose behind this? I feel that there's none, except for the library author's preference, and therefore this context switch wasted my time and energy.
  Maybe I'm being unfair and the code author is so wrapped up in liking anime girls that they think it would be soothing to people who end up on that page. In which case, massive failure of understanding the target audience.
  Maybe they could allow changing the art or turning it off?
  > Anime culture does this all the time >> I don't know what bone you're picking here
  I'm not picking any bone there. I love anime, and I love the way it feels so free in borrowing from other cultures. That said, the anime I tend to like is more Miyazaki or Satoshi Kon and less kawaii girls.
  
  Reply View | 9 replies
ChrisRR 3 days ago

I'm assuming the aversion is more about why young anime girls are popping up, not about what animal it is

Reply View | 14 replies
- armada651 3 days ago
  
  Why is there an aversion though? Is it about the image itself or because of the subculture people are associating with the image?
  
  Reply View | 13 replies
  
  ChrisRR 3 days ago
  
  Both. I don't want any random pictures of young girls popping up while I'm browsing the web, and why would adults insert pictures of young girls into their project in the first place?
  
  Reply View | 5 replies
  
  octo888 3 days ago
  
  It's an aversion to the sexualised depiction of girls barely the age of puberty or under the age of consent.
  I'd ask why you /don't/ have an aversion to that?
  (yes, "not all anime" etc...)
  
  Reply View | 6 replies
pak9rabid 3 days ago

Well, thank you for that. That's a great weight off me mind.

Reply View | 0 replies
JdeBP 3 days ago

... but entirely lacking the primary visual feature that Anubis had.

Reply View | 0 replies

rootsudo 3 days ago

When I instantly read it, I knew it was anubis. I hope the anime catgirls never disapear from that project :)

Reply View 76 replies

hdndiebf 3 days ago

This anime thing is the one thing about computer culture that I just don't seem to get. I did not get it as child, when suddenly half of children cartoons became animes and I just disliked the aestheic. I didn't get it in school, when people started reading mangas . I'll probably never get it. Therefore I sincerely hope, they do go away from anubis, so I can further dwell in my ignorance.

Reply View | 13 replies
- timcambrant 3 days ago
  
  I feel the same. It's a distinct part of nerd culture.
  In the '70s, if you were into computers you were most likely also a fan of Star Trek. I remember an anecdote from the 1990s when an entire dial-up ISP was troubleshooting its modem pools because there were zero people connected and they assumed there was an outage. The outage happened to occur exactly while that week's episode of X-Files was airing in their time zone. Just as the credits rolled, all modems suddenly lit up as people connected to IRC and Usenet to chat about the episode. In ~1994 close to 100% of residential internet users also happened to follow X-Files on linear television. There was essentially a 1:1 overlap between computer nerds and sci-fi nerds.
  Today's analog seems to be that almost all nerds love anime and Andy Weir books and some of us feel a bit alienated by that.
  
  Reply View | 2 replies
  
  SnuffBox 2 days ago
  
  > Today's analog seems to be that almost all nerds love anime and Andy Weir books and some of us feel a bit alienated by that.
  Especially because (from my observation) modern "nerds" who enjoy anime seem to relish at bringing it (and various sex-related things) up at inappropriate times and are generally emotionally immature.
  It's quite refreshing seeing that other people have similar lines of thinking and that I'm not alone in feeling somewhat alienated.
  
  Reply View | 0 replies
  
  cdrini 3 days ago
  
  I think I'd push back and say that nerd culture is no longer really a single thing. Back in the star trek days, the nerd "community" was small enough that star trek could be a defining quality shared by the majority. Now the nerd community has grown, and there are too many people to have defining parts of the culture that are loved by the majority.
  Eg if the nerd community had $x$ people in the star trek days, now there are more than $x$ nerds who like anime and more than $x$ nerds who dislike it. And the total size is much bigger than both.
  
  Reply View | 0 replies
- armada651 3 days ago
  
  But what if they choose a different image that you don't get? What if they used an abstract modern art piece that no one gets? Oh the horror!
  
  Reply View | 0 replies
- Aachen 3 days ago
  
  You don't have to get it to be able to accept that others like it. Why not let them have their fun?
  This sounds more as though you actively dislike anime than merely not seeing the appeal or being "ignorant". If you were to ignore it, there wouldn't be an issue...
  
  Reply View | 5 replies
  
  account42 3 days ago
  
  They can have their fun on their personal websites. Subjecting others to your "fun" when you knows it annoys them is not cool.
  
  Reply View | 4 replies
- balamatom 3 days ago
  
  Might've caught on because the animes had plots, instead of considering viewers to have the attention spans of idiots like Western kids' shows (and, in the 21st century, software) tend to do.
  
  Reply View | 2 replies
  
  timcambrant 3 days ago
  
  I don't think it's relevant to debate if anime or other forms of media is objectively better. But as someone who has never understood anime, I view mainstream western TV series as filled with hours of cleverly written dialogue and long story arches, whereas the little anime I've watched seems to mostly be overly dramatic colorful action scenes with intense screamed dialogue and strange bodily noises. Should we maybe assume that we are both a bit ignorant of the preferences of others?
  
  Reply View | 1 reply
  
  balamatom 3 days ago
  
  Let's rather assume that you're the kind of person who debates a thing by first saying that it's not relevant to debate, then putting forward a pretty out-of-context comparison, and finally concluding that I should feel bad about myself. That kind of story arc does seem to correlate with finding mainstream Western TV worthwhile; there's something structurally similar to the funny way your thought went.
  
  Reply View | 0 replies
bawolff 3 days ago

Its nice to see there is still some whimsy on the internet.
Everything got so corporate and sterile.

Reply View | 3 replies
- account42 3 days ago
  
  Everyone copying the same Japanese cartoon style isn't any better than everyone copying corporate memphis.
  
  Reply View | 2 replies
  
  [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
  
  lordhumphrey 3 days ago
  
  I think it definitively would be. Perhaps a small one, but still
  
  Reply View | 0 replies
ghssds 3 days ago

As Anubis the egyptian god is represented as a dog-headed human, I thought the drawing was of a dog-girl.

Reply View | 2 replies
- nemomarx 3 days ago
  
  Perhaps a jackal girl? I guess "cat girl" gets used very broadly to mean kemomimi (pardon the spelling) though
  
  Reply View | 1 reply
  
  m4rtink 3 days ago
  
  kemono == animal
  mimi == ears
  
  Reply View | 0 replies
Der_Einzige 3 days ago

It's not the only project with an anime girl as its mascot.
ComfyUI has what I think is a foxgirl as its official mascot, and that's the de-facto primary UI for generating Stable Diffusion or related content.

Reply View | 2 replies
- SnuffBox 2 days ago
  
  I've noticed the word "comfy" used more than usual recently and often by the anime-obsessed, is there cultural relevance I'm not understanding?
  
  Reply View | 1 reply
  
  AlexeyBelov 16 hours ago
  
  OK, you've been all over this thread being negative and angry. On a new account, which makes it even more sus. Take a break from social media.
  
  Reply View | 0 replies
bakugo 3 days ago

It's more likely that the project itself will disappear into irrelevance as soon as AI scrapers bother implementing the PoW (which is trivial for them, as the post explains) or figure out that they can simply remove "Mozilla" from their user-agent to bypass it entirely.

Reply View | 46 replies
- debugnik 3 days ago
  
  > as AI scrapers bother implementing the PoW
  That's what it's for, isn't it? Make crawling slower and more expensive. Shitty crawlers not being able to run the PoW efficiently or at all is just a plus. Although:
  > which is trivial for them, as the post explains
  Sadly the site's being hugged to death right now so I can't really tell if I'm missing part of your argument here.
  > figure out that they can simply remove "Mozilla" from their user-agent
  And flag themselves in the logs to get separately blocked or rate limited. Servers win if malicious bots identify themselves again, and forcing them to change the user agent does that.
  
  Reply View | 21 replies
  
  throwawayffffas 3 days ago
  
  > That's what it's for, isn't it? Make crawling slower and more expensive.
  The default settings produce a computational cost of milliseconds for a week of access. For this to be relevant it would have to be significantly more expensive to the point it would interfere with human access.
  
  Reply View | 6 replies
  
  shkkmo 3 days ago
  
  The explanation of how the estimate is made is more detailed, but here is the referenced conclusion:
  >> So (11508 websites * 2^16 sha256 operations) / 2^21, that’s about 6 minutes to mine enough tokens for every single Anubis deployment in the world. That means the cost of unrestricted crawler access to the internet for a week is approximately $0.
  >> In fact, I don’t think we reach a single cent per month in compute costs until several million sites have deployed Anubis.
  
  Reply View | 12 replies
  
  dcminter 3 days ago
  
  > Sadly the site's being hugged to death right now
  Luckily someone had already captured an archive snapshot: https://archive.ph/BSh1l
  
  Reply View | 0 replies
- skydhash 3 days ago
  
  It's more about the (intentional?) DDoS from AI scrappers, than preventing them from accessing the content. Bandwidth is not cheap.
  
  Reply View | 0 replies
- unclad5968 3 days ago
  
  Im not on Firefox or any Firefox derivative and I still get anime cat girls making sure I'm not a bot.
  
  Reply View | 3 replies
  
  nemomarx 3 days ago
  
  Mozilla is used in the user agent string of all major browsers for historical reasons, but not necessarily headless ones or so on.
  
  Reply View | 2 replies
- [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
- dingnuts 3 days ago
  
  [flagged]
  
  Reply View | 17 replies
  
  verteu 3 days ago
  
  > PoW increases the cost for the bots which is great. Trivial to implement, sure, but that added cost will add up quickly.
  No, the article estimates it would cost less than a single penny to scrape all pages of 1,000,000 distinct Anubis-guarded websites for an entire month.
  
  Reply View | 5 replies
  
  userbinator 3 days ago
  
  I thought HN was anti-copyright and anti-imaginary-property, or at least the bulk of its users were. Yet all of a sudden, "but AI!!!!1"?
  a federal crime
  The rest of the world doesn't care.
  
  Reply View | 1 reply
  
  klabb3 3 days ago
  
  > I thought HN was anti-copyright
  Maybe. But what’s happening is ”copyright for thee not for me”, not a universal relaxation of copyright. This loophole exploitation by behemoths doesn’t advance any ideological goals, it only inflames the situation because now you have an adversarial topology. You can see this clearly in practice – more and more resources are going into defense and protection of data than ever before. Fingerprinting, captchas, paywalls, login walls, etc etc.
  
  Reply View | 0 replies
  
  altairprime 3 days ago
  
  Don’t forget signed attestations from “user probably has skin in the game” cloud providers like iCloud (already live in Safari and accepted by Cloudflare, iirc?) — not because they identify you but because abusive behavior will trigger attestation provider rate limiting and termination of services (which, in Apple’s case, includes potentially a console kill for the associated hardware). It’s not very popular to discuss at HN but I bet Anubis could add support for it regardless :)
  https://datatracker.ietf.org/wg/privacypass/about/
  https://www.w3.org/TR/vc-overview/
  
  Reply View | 0 replies
  
  shkkmo 3 days ago
  
  > PoW increases the cost for the bots which is great.
  But not by any meaningful amount as explained in the article. All it actually does is rely on it's obscurity while interfering with legitimate use.
  
  Reply View | 0 replies
  
  nialv7 3 days ago
  
  > Fuck AI scrapers, and fuck all this copyright infringement at scale.
  Yes, fuck them. Problem is Anubis here is not doing the job. As the article already explains, currently Anubis is not adding a single cent to the AI scrappers' costs. For Anubis to become effective against scrappers, it will necessarily have to become quite annoying for legitimate users.
  
  Reply View | 6 replies
guappa 3 days ago

We all know it's doomed

Reply View | 3 replies
- balamatom 3 days ago
  
  That's called a self-fulfilling prophecy and is not in fact mandatory to participate in.
  
  Reply View | 2 replies
  
  guappa 3 days ago
  
  I'm not making any git commits to remove it…
  
  Reply View | 1 reply
  
  balamatom 3 days ago
  
  Probably talking about different doomed things then, sorry.
  
  Reply View | 0 replies
NelsonMinar 3 days ago

¡Nyah!

Reply View | 0 replies

bawolff 3 days ago

> This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity. It feels like this solution has the problem backwards, effectively only limiting access to those without resources or trying to conserve them.

Counterpoint - it seems to work. People use anubis because its the best of bad options.

If theory and reality disagree, it means either you are missing something or your theory is wrong.

Reply View 4 replies

semiquaver 3 days ago

Counter-counter point: it only stopped them for a few weeks and now it doesn’t work: https://news.ycombinator.com/item?id=44914773

Reply View | 3 replies
- jeroenhd 3 days ago
  
  Geoblocking China and Singapore solves that problem, it seems, at least the non-residential IPs (though I also see a lot of aggressive bots coming from residential IP space from China).
  I wish the old trick of sending CCP-unfriendly content to get the great firewall to kill the connection for you still worked, but in the days of TLS everywhere that doesn't seem to work anymore.
  
  Reply View | 0 replies
- Aachen 3 days ago
  
  Only Huawei so far, no? That could be easy to block on a network level for the time being
  Of course we knew from the beginning that this first stage of "bots don't even try to solve it, no matter the difficulty" isn't a forever solution
  
  Reply View | 1 reply
  
  jeroenhd 3 days ago
  
  AliCloud also seems to send a more capable scraper army, but so far they're not using botnets ("residential proxies") to hide their bad practices.
  
  Reply View | 0 replies

sidewndr46 3 days ago

> The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans

I'm an unsure if this deadpan humor or if the author has never tried to solve a CAPTCHA that is something like "select the squares with an orthodox rabbi present"

Reply View 15 replies

Lammy 3 days ago

I enjoyed the furor around the 2008 RapidShare catpcha lol
- https://www.htmlcenter.com/blog/now-thats-an-annoying-captch...
- https://depressedprogrammer.wordpress.com/2008/04/20/worst-c...
- https://medium.com/xato-security/a-captcha-nightmare-f6176fa...

Reply View | 0 replies
classichasclass 3 days ago

The problem with that CAPTCHA is you're not allowed to solve it on Saturdays.

Reply View | 0 replies
windward 3 days ago

I wonder if it's an intentional quirk that you can only pass some CAPTCHAs if you're a human who knows what an American fire hydrant or school bus looks like?

Reply View | 3 replies
- lproven 2 days ago
  
  > an American fire hydrant or school bus
  So much this. The first time one asked me to click on "crosswalks", I genuinely had to think for a while as I struggled to remember WTF a "crosswalk" was in AmEng. I am a native English speaker, writer, editor and professionally qualified teacher, but my form of English does not have the word "crosswalk" or any word that is a synonym for it. (It has phrases instead.)
  Our schoolbuses are ordinary buses with a special number on the front. They are no specific colour.
  There are other examples which aren't coming immediately to mind, but it is vexing when the designer of a CAPTCHA isn't testing if I am human but if I am American.
  
  Reply View | 0 replies
- latexr 3 days ago
  
  I doubt it’s intentional. Google (owner of reCAPTCHA) is a US company, so it’s more likely they either haven’t considered what they see every day is far from universal; don’t care about other countries; or specifically just care about training for the US.
  
  Reply View | 0 replies
- jeroenhd 3 days ago
  
  Google demanding I flag yellow cars when asked to flag taxis is the silliest Americanism I've seen. At least the school bus has SCHOOL BUS written all over it and fire hydrants aren't exactly an American exclusive thing.
  On some Russian and Asian site I ran into trouble signing up for a forum using translation software because the CAPTCHA requires me to enter characters I couldn't read or reproduce. It doesn't happen as often as the Google thing, but the problem certainly isn't restricted to American sites!
  
  Reply View | 0 replies
wingworks 3 days ago

There are also services out that will solve any CAPTCHA for you at a very small cost to you. And an AI company will get steep discounts with the volumes of traffic they do.
There are some browser extensions for it too, like NopeCHA, it works 99% of the time and saves me the hassle of doing them.
Any site using CAPTCHA's today is really only hurting there real customers and low hanging fruit.
Of course this assumes they can't solve the capture themselves, with ai, which often they can.

Reply View | 1 reply
- petesergeant 3 days ago
  
  Yes, but not at a rate that enables them to be a risk to your hosting bill. My understanding is that the goal here isn't to prevent crawlers, it's to prevent overly aggressive ones.
  
  Reply View | 0 replies
bawolff 3 days ago

Well the problem is that computers got good at basically everything.
Early 2000s captchas really were like that.

Reply View | 6 replies
- ok123456 3 days ago
  
  The original reCAPTCHA was doing distributed book OCR. It was sold as an altruistic project to help transcribe old books.
  
  Reply View | 5 replies
  
  guappa 3 days ago
  
  And now they're using us to train car driving AI :(
  
  Reply View | 4 replies

pkal 3 days ago

Superficial comment regarding the catgirl, I don't get why some people are so adamant and enthusiastic for others to see it, but if you like me find it distasteful and annoying, consider copying these uBlock rules: https://sdf.org/~pkal/src+etc/anubis-ublock.txt. Brings me joy to know what I am not seeing whenever I get stopped by this page :)

Reply View 4 replies

squigz 3 days ago

I don't get why so many people find it "distasteful and annoying"

Reply View | 3 replies
- pkal 2 days ago
  
  Can you clarify if you mean that you do no understand the reasons that people dislike these images, or do you find the very idea of disliking it hard to relate to?
  I cannot claim that I understand it well, but my best guess is that these are images that represent a kind of culture that I have encountered both in real-life and online that I never felt comfortable around. It doesn't seem unreasonable that this uneasiness around people with identity-constituting interests in anime, Furries, MLP, medieval LARP, etc. transfers back onto their imagery. And to be clear, it is not like I inherently hate anime as a medium or the idea of anthropomorphism in art. There is some kind of social ineptitude around propagating these _kinds_ of interests that bugs me.
  I cannot claim that I am satisfies with this explanation. I know that the dislike I feel for this is very similar to that I feel when visiting a hacker space where I don't know anyone. But I hope that I could at least give a feeling for why some people don't like seeing catgirls every time I open a repository and that it doesn't necessarily have anything to do with advocating for a "corporate soulless web".
  
  Reply View | 0 replies
- account42 3 days ago
  
  You could respect it without "getting" it though.
  
  Reply View | 0 replies
- IshKebab 3 days ago
  
  I can't really explain it but it definitely feels extremely cringeworthy. Maybe it's the neckbeard sexuality or the weird furry aspect. I don't like it.
  
  Reply View | 0 replies

sugarpimpdorsey 3 days ago

Every time I see one of these I think it's a malicious redirect to some pervert-dwelling imageboard.

On that note, is kernel.org really using this for free and not the paid version without the anime? Linux Foundation really that desperate for cash after they gas up all the BMW's?

Reply View 50 replies

qualeed 3 days ago

It's crazy (especially considering anime is more popular now than ever; netflix alone is making billions a year on anime) that people see a completely innocent little anime picture and immediately think "pervent-dwelling imageboard".

Reply View | 35 replies
- magicalhippo 3 days ago
  
  > people see a completely innocent little anime picture and immediately think "pervent-dwelling imageboard"
  Think you can thank the furries for that.
  Every furry I've happened to come across was very pervy in some way, and so that what immediately comes to mind when I see furry-like pictures like the one shown in the article.
  YMMV
  
  Reply View | 2 replies
  
  voidUpdate 3 days ago
  
  Out of interest, how many furries have you met? I've been to several fur meets, and have met approximately three furries who I would not want to know anymore for one reason or another
  
  Reply View | 1 reply
  
  magicalhippo 3 days ago
  
  Admittedly just a handful. But I met them in entirely non-furry settings, for example as a user of a regular open source program I was a contributor to (which wasn't Rust based[1]).
  None of them were very pervy at first, only after I got to know them.
  [1]: https://www.reddit.com/r/rust/comments/vyelva/why_are_there_...
  
  Reply View | 0 replies
- Seattle3503 3 days ago
  
  To be fair, that's the sort of place where I spend most of my free time.
  
  Reply View | 0 replies
- gruez 3 days ago
  
  "Anime pfp" stereotype is alive and well.
  
  Reply View | 0 replies
- ants_everywhere 3 days ago
  
  they've seized the moment to move the anime cat girls off the Arch Linux desktop wallpapers and onto lore.kernel.org.
  
  Reply View | 0 replies
- account42 3 days ago
  
  It's not crazy at all that anyone who has been online for more than a day has that association.
  
  Reply View | 0 replies
- turtletontine 3 days ago
  
  Even if the images aren’t the kind of sexualized (or downright pornographic) content this implies… having cutesy anime girls pop up when a user loads your site is, at best, wildly unprofessional. (Dare I say “cringe”?) For something as serious and legit as kernel.org to have this, I do think it’s frankly shocking and unacceptable.
  
  Reply View | 20 replies
  
  Modified3019 3 days ago
  
  https://storage.courtlistener.com/recap/gov.uscourts.miwd.11...
  https://storage.courtlistener.com/recap/gov.uscourts.miwd.11...
  “The future is now, old man”
  
  Reply View | 2 replies
  
  ge96 3 days ago
  
  never forget the Ponies CV of an ML guy https://www.huffingtonpost.co.uk/2013/09/03/my-little-pony-r...
  
  Reply View | 2 replies
  
  voidUpdate 3 days ago
  
  Noted, I will now add anime girls to my website, so I'm not at risk of being misconstrued as "professional"
  
  Reply View | 1 reply
  
  [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
  
  Hamuko 3 days ago
  
  Isn't the mascot/logo for the Linux kernel a cartoon penguin?
  
  Reply View | 7 replies
  
  antiloper 3 days ago
  
  If anime girls prevent LLM scraper sympathizers from interacting with the kernel, that's a good thing and should be encouraged more!
  
  Reply View | 1 reply
  
  windward 3 days ago
  
  You'd think it's the opposite, look at Joseph Redmon's resume:
  https://web.itu.edu.tr/yavuzid19/cv.pdf
  
  Reply View | 0 replies
  
  [removed] 3 days ago
  
  [deleted]
  
  Reply View | 0 replies
  
  aseipp 3 days ago
  
  You'll live.
  
  Reply View | 0 replies
- macinjosh 3 days ago
  
  [flagged]
  
  Reply View | 0 replies
- mvdtnz 3 days ago
  
  [flagged]
  
  Reply View | 5 replies
  
  qualeed 3 days ago
  
  >If you don't get pedophile vibes from that picture it's on you.
  Wow, what an absolutely wild statement. I hate to break it to you, but I'm not the one sexualizing the cartoon picture.
  
  Reply View | 4 replies
Dilettante_ 3 days ago

For me it's the flipside: It makes me think "Ahh, my people!"

Reply View | 0 replies
creatonez 3 days ago

Huh, why would they need the unbranded version? The branded version works just fine. It's usually easier to deploy ordinary open source software than it is for software that needs to be licensed, because you don't need special download pages or license keys.
If it makes sense for an organization to donate to a project they rely on, then they should just donate. No need to debrand if it's not strictly required, all that would do is give the upstream project less exposure. For design reasons maybe? But LKML isn't "designed" at all, it has always exposed the raw ugly interface of mailing list software.
Also, this brand does have trust. Sure, I'm annoyed by these PoW captcha pages, but I'm a lot more likely to enable Javascript if it's the Anubis character, than if it is debranded. If it is debranded, it could be any of the privacy-invasive captcha vendors, but if it's Anubis, I know exactly what code is going to run.

Reply View | 1 reply
- rustystump 3 days ago
  
  If i saw an anime pic show up, thatd be a flag. I only know of Anubis’ existence and use of anime from hn.
  It is only trusted by a small subset of people who are in the know. It is not about “anime bad” but that a large chunk of the population isnt into it for whatever reason.
  I love anime but it can also be cringe. I find this cringe as it seems many others do too.
  
  Reply View | 0 replies
s1mplicissimus 3 days ago

[flagged]

Reply View | 0 replies
Lammy 3 days ago

[flagged]

Reply View | 9 replies
- sugarpimpdorsey 3 days ago
  
  > Anubis is a clone of Kiwiflare, not an original work, so you're actually sort of half-right:
  Interesting. That itself appears to be a clone of haproxy-protection. I know there has also been an nginx module that does the same for some time. Either way, proof-of-work is by this point not novel.
  Everyone seems to have overlooked the more substantive point of my comment which is that it appears kernel.org cheaped out and is using the free version of Anubis, instead of paying up to support the developer for his work. You know they have the money to do it.
  In 2024 the Linux Foundation reported $299.7M in expenses, with $22.7M of that going toward project infrastructure and $15.2M on "event services" (I guess making sure the cotton candy machines and sno-cone makers were working at conferences).
  My point is, cough up a few bucks for a license you chiselers.
  
  Reply View | 3 replies
  
  murderfs 3 days ago
  
  > My point is, cough up a few bucks for a license you chiselers.
  You mean this one? https://github.com/TecharoHQ/anubis/blob/main/LICENSE
  
  Reply View | 1 reply
  
  sugarpimpdorsey 3 days ago
  
  No I mean this one:
  https://anubis.techaro.lol/docs/admin/botstopper
  
  Reply View | 0 replies
  
  prmoustache 3 days ago
  
  > Everyone seems to have overlooked the more substantive point of my comment which is that it appears kernel.org cheaped out and is using the free version of Anubis, instead of paying up to support the developer for his work. You know they have the money to do it. > > In 2024 the Linux Foundation reported $299.7M in expenses, with $22.7M of that going toward project infrastructure and $15.2M on "event services" (I guess making sure the cotton candy machines and sno-cone makers were working at conferences). > > My point is, cough up a few bucks for a license you chiselers.
  Several points:
  - there is no license to pay. This is free (as in open source and as in beer) software. There is commercial support if you feel you need it and sponsoring options however. Sponsoring is not paying a license.
  - Sometimes it takes so long to get approval for a sponsor that large org member give up.
  - Obviously kernel.org is using an old release of anubis so they likely observed a huge spike in bandwith used at some point and used anubis, solving the problem immediately. I don't remember anubis proposing a paid license at the time of the early releases. I may be wrong but it may be that kernel.org admins have never heard of the possibly of sponsoring nor are they interested in support.
  - you don't have to pay anythinf to change/remove the image and the people who implemented this clearly do not care as they didn't do it.
  - do we have evidence that the anubis developer ever donated directly or indirectly to Linus Torvalds and the thousands of developers who worked on the kernel?
  
  Reply View | 0 replies
- creatonez 3 days ago
  
  Anubis has nothing to do with Kiwiflare, there's no connection at all. It's not the same codebase, and the inspiration for Anubis comes from Hashcash (1997) and numerous other examples of web PoW that predate Kiwiflare, which perhaps tens of thousands of websites were already using as an established technique. What makes you think it is a clone of it?
  
  Reply View | 0 replies
- efilife 3 days ago
  
  Can somebody please explain why was this comment flagged to death? I seem to be missing something
  
  Reply View | 2 replies
  
  ufo 3 days ago
  
  Possibly because it links to kiwifarms (nasty website to say the least)
  
  Reply View | 0 replies
  
  creatonez 3 days ago
  
  Well, it's both complete misinformation and attempts to tie a reputable open source project to an unrelated harassment and stalking website.
  
  Reply View | 0 replies
- fortran77 3 days ago
  
  I saw the description and thought "Wow! That works just like the DDOS retarding" of KiwiFlare. I didn't know it was a proper fork of it.
  
  Reply View | 0 replies

bogwog 3 days ago

I wonder if the best solution is still just to create link mazes with garbage text like this: https://blog.cloudflare.com/ai-labyrinth/

It won't stop the crawlers immediately, but it might lead to an overhyped and underwhelming LLM release from a big name company, and force them to reassess their crawling strategy going forward?

Reply View 3 replies

ronsor 3 days ago

That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key.

Reply View | 1 reply
- bogwog 2 days ago
  
  If the "garbage data" is AI generated, it'll be hard or impossible to filter.
  
  Reply View | 0 replies
creatonez 3 days ago

Crawlers already know how to stop crawling recursive or otherwise excessive/suspicious content. They've dealt with this problem long before LLM-related crawling.

Reply View | 0 replies

ok123456 3 days ago

Why is kernel.org doing this for essentially static content? Cache control headers and ETAGS should solve this. Also, the Linux kernel has solved the C10K problem.

Reply View 8 replies

mixologic 3 days ago

Because its static content that is almost never cached because its infrequently accessed. Thus, almost every hit goes to the origin.

Reply View | 1 reply
- ok123456 2 days ago
  
  The contents in question are statically generated, 1-3 KB HTML files. Hosting a single image would be the equivalent of cold serving 100s of requests.
  Putting up a scraper shield seems like it's more of a political statement than a solution to a real technical problem. It's also antithetical to open collaboration and an open internet of which Linux is a product.
  
  Reply View | 0 replies
whatevaa 3 days ago

Bots don't respect that.

Reply View | 5 replies
- 1gn15 3 days ago
  
  Use a CDN.
  
  Reply View | 4 replies
  
  trenchpilgrim 3 days ago
  
  A great option for most people, and indeed Anubis' README recommends using Cloudflare if possible. However, not everyone can use a paid CDN. Some people can't pay because their payment methods aren't accepted. Some people need to serve content or to countries which a major CDN can't for legal and compliance reasons. Some organizations need their own independent infrastructure to serve their organizational misson.
  
  Reply View | 0 replies
  
  Aachen 2 days ago
  
  So that someone else pays for your bandwidth while seeing who is interested in this content? Idk about that solution
  
  Reply View | 2 replies

ChocolateGod 3 days ago

I have a S24 (flagship of 2024) and Anubis often takes 10-20 seconds to complete, that time is going to add up if more and more sites adopt it, leaning to a worse browsing experience and wasted battery life.

Meanwhile AI farms will just run their own nuclear reactors eventually and be unaffected.

I really don't understand why someone thought this was a good idea, even if well intentioned.

Reply View 9 replies

prmoustache 3 days ago

Something must be wrong on your flagship smartphone because I have an entry level one that doesn't take that long.
It seems there is a large number of operations crawling the web to build models that aren't using directly infrastructure hosted on AI farms BUT botnet running on commodity hardware and residencial networks to circumvent their ip range from being blacklisted. Anubis point is to block those.

Reply View | 0 replies
Aachen 2 days ago

Which browser and which difficulty setting is that?
Because I've got the same model line but about 3 or 4 years older and it usually just flashes by in the browser Lightning from F-droid which is an OS webview wrapper. On occasion a second or maybe two, I assume that's either bad luck in finding a solution or a site with a higher difficulty setting. Not sure if I've seen it in Fennec (firefox mobile) yet but, if so, it's the same there
I've been surprised that this low threshold stops bots but I'm reading in this thread that it's rather that bot operators mostly just haven't bothered implementing the necessary features yet. It's going to get worse... We've not even won the battle let alone the war. Idk if this is going to be sustainable, we'll see where the web ends up...

Reply View | 0 replies
jeroenhd 3 days ago

Either your phone is on some extreme power saving mode, your ad blocker is breaking Javascript, or something is wrong with your phone.
I've certainly seen Anubis take a few seconds (three or four maybe) but that was on a very old phone that barely loaded any website more complex than HN.

Reply View | 0 replies
vova_hn 3 days ago

I have Pixel 7 (released in 2022) and it usually takes less than a second...

Reply View | 0 replies
TZubiri 3 days ago

I remember that LiteCoin briefly had this idea, to be easy on consumer hardware but hard on GPUs. The ASICs didn't take long to obliterate the idea though.
Maybe there's going to be some form of pay per browse system? even if it's some negligible cost on the order of 1$ per month (and packaged with other costs), I think economies of scale would allow servers to perform a lifetime of S24 captchas in a couple of seconds.

Reply View | 0 replies
whatevaa 3 days ago

Something is wrong with your flagship if it takes that long.

Reply View | 3 replies
- ChocolateGod 3 days ago
  
  Samsung's UI has this feature where it turns on power saving mode when it detects light use.
  
  Reply View | 0 replies
- prmoustache 3 days ago
  
  I guess his flagship IS compromised and part of an AI crawling botnet ;-)
  
  Reply View | 0 replies
- Lammy 2 days ago
  
  You're looking at it wrong.
  
  Reply View | 0 replies

WesolyKubeczek 4 days ago

I disagree with the post author in their premise that things like Anubis are easy to bypass if you craft your bot well enough and throw the compute at it.

Thing is, the actual lived experience of webmasters tells that the bots that scrape the internets for LLMs are nothing like crafted software. They are more like your neighborhood shit-for-brain meth junkies competing with one another who makes more robberies in a day, no matter the profit.

Those bots are extremely stupid. They are worse than script kiddies’ exploit searching software. They keep banging the pages without regard to how often, if ever, they change. If they were 1/10th like many scraping companies’ software, they wouldn’t be a problem in the first place.

Since these bots are so dumb, anything that is going to slow them down or stop them in their tracks is a good thing. Short of drone strikes on data centers or accidents involving owners of those companies that provide networks of botware and residential proxies for LLM companies, it seems fairly effective, doesn’t it?

Reply View 2 replies

int_19h 3 days ago

It is the way it is because there are easy pickings to be made even with this low effort, but the more sites adopt such measures, the less stupid your average bot will be.

Reply View | 0 replies
busterarm 3 days ago

Those are just the ones that you've managed to ID as bots.
Ask me how I know.

Reply View | 0 replies

Previous Next