Why are anime catgirls blocking my access to the Linux kernel?
(lock.cmpxchg8b.com)815 points by taviso 4 days ago
815 points by taviso 4 days ago
Yes, I think you're right. The commentary about its (presumed, imagined) effectiveness is very much making the assumption that it's designed to be an impenetrable wall[0] -- i.e. prevent bots from accessing the content entirely.
I think TFA is generally quite good and has something of a good point about the economics of the situation, but finding the math shake out that way should, perhaps, lead one to question their starting point / assumptions[1].
In other words, who said the websites in question wanted to entirely prevent crawlers from accessing them? The answer is: no one. Web crawlers are and have been fundamental to accessing the web for decades. So why are we talking about trying to do that?
[0] Mentioning 'impenetrable wall' is probably setting off alarm bells, because of course that would be a bad design.
[1] (Edited to add:) I should say 'to question their assumptions more' -- like I said, the article is quite good and it does present this as confusing, at least.
> In other words, who said the websites in question wanted to entirely prevent crawlers from accessing them? The answer is: no one. Web crawlers are and have been fundamental to accessing the web for decades. So why are we talking about trying to do that?
I agree, but the advertising is the whole issue. "Checking to see you're not a bot!" and all that.
Therefore some people using Anubis expect it to be an impenetrable wall, to "block AI scrapers", especially those that believe it's a way for them to be excluded from training data.
It's why just a few days ago there was a HN frontpage post of someone complaining that "AI scrapers have learnt to get past Anubis".
But that is a fight that one will never win (analog hole as the nuclear option).
If it said something like "Wait 5 seconds, our servers are busy!", I would think that people's expectations will be more accurate.
As a robot I'm really not that sympathetic to anti-bot language backfiring on humans. I have to look away every time it comes up on my screen. If they changed their language and advertising, I'll be more sympathetic -- it's not as if I disagree that overloading servers for not much benefit is bad!
Yeah, I think it's obviously a pretty natural conclusion to draw, that {thing for hinder crawler} ≅≅ {thing for stop all crawler}. Perhaps I should have stated that explicitly in the original comment.
As for the presentation/advertising, I didn't get into it because I don't hold a particularly strong opinion. Well, I do hold a particularly strong opinion, but not one that really distinguishes Anubis from any of the other things. I'm fully onboard with what you're saying -- I find this sort of software extremely hostile and the fact that so many people don't[0] reminds me that I'm not a people.
In my experience, this particular jump scare is about the same as any of the other services. The website is telling me that I'm not welcome for whatever arbitrary reason it is now, and everyone involved wants me to feel bad.
Actually there is one thing I like about the Anubis experience[1] compared to the other ones, it doesn't "Would you like to play a game?" me. As a robot I appreciate the bluntness, I guess.
(the games being: "click on this. now watch spinny. more. more. aw, you lose! try again?", and "wheel, traffic light, wildcard/indistinguishable"[2]).
[0] "just ignore it, that's what I do" they say. "Oh, I don't have a problem like that. Sucks to be you."
[1] yes, I'm talking upsides about the experience of getting **ed by it. I would ask how we got here but it's actually pretty easy to follow.
[2] GCHQ et al. should provide a meatspace operator verification service where they just dump CCTV clips and you have to "click on the squares that contain: UNATTENDED BAG". Call it "phonebooth, handbag, foreign agent".
(Apologies for all the weird tangents -- I'm just entertaining myself, I think I might be tired.)
But then you rate limit that challenge.
You could setup a system for parellelizing the creation of these Anubis PoW cookies independent of the crawling logic. That would probably work, but it's a pretty heavy lift compared to 'just run a browser with JavaScript'.
I'm a scraper developer and Anubis would have worked 10 - 20 years ago, but now all broad scrapers run on a real headless browser with full cookie support and costs relatively nothing in compute. I'd be surprised if LLM bots would use anything else given the fact that they have all of this compute and engineers already available.
That being said, one point is very correct here - by far the best effort to resist broad crawlers is a _custom_ anti-bot that could be as simple as "click your mouse 3 times" because handling something custom is very difficult in broad scale. It took the author just few minutes to solve this but for someone like Perplexity it would take hours of engineering and maintenance to implement a solution for each custom implementation which is likely just not worth it.
You can actually see this in real life if you google web scraping services and which targets they claim to bypass - all of them bypass generic anti-bots like Cloudflare, Akamai etc. but struggle with custom and rare stuff like Chinese websites or small forums because scraping market is a market like any other and high value problems are solved first. So becoming a low value problem is a very easy way to avoid confrontation.
> That being said, one point is very correct here - by far the best effort to resist broad crawlers is a _custom_ anti-bot that could be as simple as "click your mouse 3 times" because handling something custom is very difficult in broad scale.
Isn't this what Microsoft is trying to do with their sliding puzzle piece and choose the closest match type systems?
Also, if you come in on a mobile browser it could ask you to lay your phone flat and then shake it up and down for a second or something similar that would be a challenge for a datacenter bot pretending to be a phone.
How do you bypass cloudflare? I do some light scrapping for some personal stuff, but I can't figure out how to bypass it. Like do you randomize IPs using several VPNs at the same time?
I usually just sit there on my phone pressing the "I am not a robot box" when it triggers.
It's still pretty hard to bypass it with open source solutions. To bypass CF you need:
- an automated browser that doesn't leak the fact it's being automated
- ability to fake the browser fingerprint (e.g. Linux is heavily penalized)
- residential or mobile proxies (for small scale your home IP is probably good enough)
- deployment environment that isn't leaked to the browser.
- realistic scrape pattern and header configuration (header order, referer, prewalk some pages with cookies etc.)
This is really hard to do at scale but for small personal scripts you can have reasonable results with flavor of the month playwright forks on github like nodriver or dedicated tools like Flaresolver but I'd just find a web scraping api with low entry price and just drop 15$ month and avoid this chase because it can be really time consuming.
If you're really on budget - most of them offer 1,000 credits for free which will get you avg 100 pages a month per service and you can get 10 of them as they all mostly function the same.
This only works if you're a low-value site (which admittedly most sites are).
> It took the author just few minutes to solve this but for someone like Perplexity it would take hours of engineering and maintenance to implement a solution for each custom implementation which is likely just not worth it.
These are trivial for an AI agent to solve though, even with very dumb watered down models.
At that point you’re probably spending more money blocking the scrapers than you would spend just letting them through.
>This dance to get access is just a minor annoyance for me, but I question how it proves I’m not a bot. These steps can be trivially and cheaply automated.
>I think the end result is just an internet resource I need is a little harder to access, and we have to waste a small amount of energy.
No need to mimic the actual challenge process. Just change your user agent to not have "Mozilla" in it; Anubis only serves you the challenge if it has that. For myself I just made a sideloaded browser extension to override the UA header for the handful of websites I visit that use Anubis, including those two kernel.org domains.
(Why do I do it? For most of them I don't enable JS or cookies for so the challenge wouldn't pass anyway. For the ones that I do enable JS or cookies for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)
Sadly, touching the user-agent header more or less instantly makes you uniquely identifiable.
Browser fingerprinting works best against people with unique headers. There's probably millions of people using an untouched safari on iPhone. Once you touch your user-agent header, you're likely the only person in the world with that fingerprint.
If someone's out to uniquely identify your activity on the internet, your User-Agent string is going to be the least of your problems.
I'll set mine to "null" if the rest of you will set yours...
If your headers are new every time then it is very difficult to figure out who is who.
yes, but it puts you in the incredibly small bucket of "users that has weird headers that don't mesh well", and makes using the rest of the (many) other fingerprinting techniques all the more accurate.
> If your headers are new every time then it is very difficult to figure out who is who.
It's very easy to train a model to identify anomalies like that.
While it's definitely possible to train a model for that, 'very easy' is nonsense.
Unless you've got some superintelligence hidden somewhere, you'd choose a neural net. To train, you need a large supply of LABELED data. Seems like a challenge to build that dataset; after all, we have no scalable method for classifying as of yet.
Yes, but you can take the bet, and win more often than not, that your adversary is most likely not tracking visitor probabilities if you can detect that they aren't using a major fingerprinting provider.
I wouldn’t think the intention is to s/Mozilla// but to select another well-known UA string.
The string I use in my extension is "anubis is crap". I took it from a different FF extension that had been posted in a /g/ thread about Anubis, which is where I got the idea from in the first place. I don't use other people's extensions if I can help it (because of the obvious risk), but I figured I'd use the same string in my own extension so as to be combined with users of that extension for the sake of user-agent statistics.
The UA will be compared to other data points such as screen resolution, fonts, plugins, etc. which means that you are definitely more identifiable if you change just the UA vs changing your entire browser or operating system.
I don't think there are any.
Because servers would serve different content based on user agent virtually all browsers start with Mozilla/5.0...
> (Why do I do it? For most of them I don't enable JS so the challenge wouldn't pass anyway. For the ones that I do enable JS for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)
Hm. If your site is "sticky", can it mine Monero or something in the background?
We need a browser warning: "This site is using your computer heavily in a background task. Do you want to stop that?"
We need a browser warning: "This site is using your computer heavily in a background task. Do you want to stop that?"
Doesn't Safari sort of already do that? "This tab is using significant power", or summat? I know I've seen that message, I just don't have a good repro.
> Just change your user agent to not have "Mozilla" in it. Anubis only serves you the challenge if you have that.
Won't that break many other things? My understanding was that basically everyone's user-agent string nowadays is packed with a full suite of standard lies.
In 2025 I think most of the web has moved on from checking user strings. Your bank might still do it but they won't be running Anubis.
If your Firefox supports sideloading extensions then making extensions that modify request or response headers is easy.
All the API is documented in https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web... . My Anubis extension modifies request headers using `browser.webRequest.onBeforeSendHeaders.addListener()` . Your case sounds like modifying response headers which is `browser.webRequest.onHeadersReceived.addListener()` . Either way the API is all documented there, as is the `manifest.json` that you'll need to write to register this JS code as a background script and whatever permissions you need.
Then zip the manifest and the script together, rename the zip file to "<id_in_manifest>.xpi", place it in the sideloaded extensions directory (depends on distro, eg /usr/lib/firefox/browser/extensions), restart firefox and it should show up. If you need to debug it, you can use the about:debugging#/runtime/this-firefox page to launch a devtools window connected to the background script.
Doesn’t that just mean the AI bots can do the same? So what’s the point?
wtf? how is this then better than a captcha or something similar?!
>Not only is Anubis a poorly thought out solution from an AI sympathizer [...]
But the project description describes it as a project to stop AI crawlers?
> Weighs the soul of incoming HTTP requests to stop AI crawlers
Why would a company that wants to stop AI crawlers give talks on LLMs and diffusion models at AI conferences?
Why would they use AI art for the first Anubis mascot until GitHub users called out the hypocrisy on the issue tracker?
Why would they use Stable Diffusion art in their blogposts until Mastodon and Bluesky users called them out on it?
This is neither here nor there but the character isn't a cat. It's in the name, Anubis, who is an Egyptian deity typically depicted as a jackal or generic canine, and the gatekeeper of the afterlife who weighs the souls of the dead (hence the tagline). So more of a dog-girl, or jackal-girl if you want to be technical.
Every representation I've ever seen of Anubis - including remarkably well preserved statues from antiquity - are either a male human body with a canine head, or fully canine.
This anime girl is not Anubis. It's a modern cartoon characters that simply borrows the name because it sounds cool, without caring anything about the history or meaning behind it.
Anime culture does this all the time, drawing on inspiration from all cultures but nearly always only paying the barest lip service to the original meaning.
I don't have an issue with that, personally. All cultures and religions should be fair game as inspiration for any kind of art. But I do have an issue with claiming that the newly inspired creation is equivalent in any way to the original source just because they share a name and some other very superficial characteristics.
> they share a name and some other very superficial characteristics.
I wasn't implying anything more than that, although now I see the confusing wording in my original comment. All I meant to say was that between the name and appearance it's clear the mascot is canid rather than feline. Not that the anime girl with dog ears is an accurate representation of the Egyptian deity haha.
I think you're taking it a bit too seriously. In turn, I am, of course, also taking it too seriously.
> I do have an issue with claiming that the newly inspired creation is equivalent in any way to the original source
Nobody is claiming that the drawing is Anubis or even a depiction of Anubis, like the statues etc. you are interested in. It's a mascot. "Mascot design by CELPHASE" -- it says, in the screenshot.
Generally speaking -- I can't say that this is what happened with this project -- you would commission someone to draw or otherwise create a mascot character for something after the primary ideation phase of the something. This Anubis-inspired mascot is, presumably, Anubis-inspired because the project is called Anubis, which is a name with fairly obvious connections to and an understanding of "the original source".
> Anime culture does this all the time, ...
I don't know what bone you're picking here. This seems like a weird thing to say. I mean, what anime culture? It's a drawing on a website. Yes, I can see the manga/anime influence -- it's a very popular, mainstream artform around the world.
I like to talk seriously about art, representation, and culture. What's wrong with that? It's at least as interesting as discussing databases or web frameworks.
In case you feel it needs linking to the purpose of this forum, the art in question here is being forcefully shown to people in a situation that makes them do a massive context switch. I want to look at the linux or ffmpeg source code but my browser failed a security check and now I'm staring at a random anime girl instead. What's the meaning here, what's the purpose behind this? I feel that there's none, except for the library author's preference, and therefore this context switch wasted my time and energy.
Maybe I'm being unfair and the code author is so wrapped up in liking anime girls that they think it would be soothing to people who end up on that page. In which case, massive failure of understanding the target audience.
Maybe they could allow changing the art or turning it off?
> Anime culture does this all the time >> I don't know what bone you're picking here
I'm not picking any bone there. I love anime, and I love the way it feels so free in borrowing from other cultures. That said, the anime I tend to like is more Miyazaki or Satoshi Kon and less kawaii girls.
I'm assuming the aversion is more about why young anime girls are popping up, not about what animal it is
Why is there an aversion though? Is it about the image itself or because of the subculture people are associating with the image?
When I instantly read it, I knew it was anubis. I hope the anime catgirls never disapear from that project :)
This anime thing is the one thing about computer culture that I just don't seem to get. I did not get it as child, when suddenly half of children cartoons became animes and I just disliked the aestheic. I didn't get it in school, when people started reading mangas . I'll probably never get it. Therefore I sincerely hope, they do go away from anubis, so I can further dwell in my ignorance.
I feel the same. It's a distinct part of nerd culture.
In the '70s, if you were into computers you were most likely also a fan of Star Trek. I remember an anecdote from the 1990s when an entire dial-up ISP was troubleshooting its modem pools because there were zero people connected and they assumed there was an outage. The outage happened to occur exactly while that week's episode of X-Files was airing in their time zone. Just as the credits rolled, all modems suddenly lit up as people connected to IRC and Usenet to chat about the episode. In ~1994 close to 100% of residential internet users also happened to follow X-Files on linear television. There was essentially a 1:1 overlap between computer nerds and sci-fi nerds.
Today's analog seems to be that almost all nerds love anime and Andy Weir books and some of us feel a bit alienated by that.
> Today's analog seems to be that almost all nerds love anime and Andy Weir books and some of us feel a bit alienated by that.
Especially because (from my observation) modern "nerds" who enjoy anime seem to relish at bringing it (and various sex-related things) up at inappropriate times and are generally emotionally immature.
It's quite refreshing seeing that other people have similar lines of thinking and that I'm not alone in feeling somewhat alienated.
I think I'd push back and say that nerd culture is no longer really a single thing. Back in the star trek days, the nerd "community" was small enough that star trek could be a defining quality shared by the majority. Now the nerd community has grown, and there are too many people to have defining parts of the culture that are loved by the majority.
Eg if the nerd community had $x$ people in the star trek days, now there are more than $x$ nerds who like anime and more than $x$ nerds who dislike it. And the total size is much bigger than both.
You don't have to get it to be able to accept that others like it. Why not let them have their fun?
This sounds more as though you actively dislike anime than merely not seeing the appeal or being "ignorant". If you were to ignore it, there wouldn't be an issue...
I don't think it's relevant to debate if anime or other forms of media is objectively better. But as someone who has never understood anime, I view mainstream western TV series as filled with hours of cleverly written dialogue and long story arches, whereas the little anime I've watched seems to mostly be overly dramatic colorful action scenes with intense screamed dialogue and strange bodily noises. Should we maybe assume that we are both a bit ignorant of the preferences of others?
Let's rather assume that you're the kind of person who debates a thing by first saying that it's not relevant to debate, then putting forward a pretty out-of-context comparison, and finally concluding that I should feel bad about myself. That kind of story arc does seem to correlate with finding mainstream Western TV worthwhile; there's something structurally similar to the funny way your thought went.
It's not the only project with an anime girl as its mascot.
ComfyUI has what I think is a foxgirl as its official mascot, and that's the de-facto primary UI for generating Stable Diffusion or related content.
OK, you've been all over this thread being negative and angry. On a new account, which makes it even more sus. Take a break from social media.
It's more likely that the project itself will disappear into irrelevance as soon as AI scrapers bother implementing the PoW (which is trivial for them, as the post explains) or figure out that they can simply remove "Mozilla" from their user-agent to bypass it entirely.
> as AI scrapers bother implementing the PoW
That's what it's for, isn't it? Make crawling slower and more expensive. Shitty crawlers not being able to run the PoW efficiently or at all is just a plus. Although:
> which is trivial for them, as the post explains
Sadly the site's being hugged to death right now so I can't really tell if I'm missing part of your argument here.
> figure out that they can simply remove "Mozilla" from their user-agent
And flag themselves in the logs to get separately blocked or rate limited. Servers win if malicious bots identify themselves again, and forcing them to change the user agent does that.
> That's what it's for, isn't it? Make crawling slower and more expensive.
The default settings produce a computational cost of milliseconds for a week of access. For this to be relevant it would have to be significantly more expensive to the point it would interfere with human access.
The explanation of how the estimate is made is more detailed, but here is the referenced conclusion:
>> So (11508 websites * 2^16 sha256 operations) / 2^21, that’s about 6 minutes to mine enough tokens for every single Anubis deployment in the world. That means the cost of unrestricted crawler access to the internet for a week is approximately $0.
>> In fact, I don’t think we reach a single cent per month in compute costs until several million sites have deployed Anubis.
> Sadly the site's being hugged to death right now
Luckily someone had already captured an archive snapshot: https://archive.ph/BSh1l
Im not on Firefox or any Firefox derivative and I still get anime cat girls making sure I'm not a bot.
> PoW increases the cost for the bots which is great. Trivial to implement, sure, but that added cost will add up quickly.
No, the article estimates it would cost less than a single penny to scrape all pages of 1,000,000 distinct Anubis-guarded websites for an entire month.
I thought HN was anti-copyright and anti-imaginary-property, or at least the bulk of its users were. Yet all of a sudden, "but AI!!!!1"?
a federal crime
The rest of the world doesn't care.
> I thought HN was anti-copyright
Maybe. But what’s happening is ”copyright for thee not for me”, not a universal relaxation of copyright. This loophole exploitation by behemoths doesn’t advance any ideological goals, it only inflames the situation because now you have an adversarial topology. You can see this clearly in practice – more and more resources are going into defense and protection of data than ever before. Fingerprinting, captchas, paywalls, login walls, etc etc.
Don’t forget signed attestations from “user probably has skin in the game” cloud providers like iCloud (already live in Safari and accepted by Cloudflare, iirc?) — not because they identify you but because abusive behavior will trigger attestation provider rate limiting and termination of services (which, in Apple’s case, includes potentially a console kill for the associated hardware). It’s not very popular to discuss at HN but I bet Anubis could add support for it regardless :)
> Fuck AI scrapers, and fuck all this copyright infringement at scale.
Yes, fuck them. Problem is Anubis here is not doing the job. As the article already explains, currently Anubis is not adding a single cent to the AI scrappers' costs. For Anubis to become effective against scrappers, it will necessarily have to become quite annoying for legitimate users.
> This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity. It feels like this solution has the problem backwards, effectively only limiting access to those without resources or trying to conserve them.
Counterpoint - it seems to work. People use anubis because its the best of bad options.
If theory and reality disagree, it means either you are missing something or your theory is wrong.
Counter-counter point: it only stopped them for a few weeks and now it doesn’t work: https://news.ycombinator.com/item?id=44914773
Geoblocking China and Singapore solves that problem, it seems, at least the non-residential IPs (though I also see a lot of aggressive bots coming from residential IP space from China).
I wish the old trick of sending CCP-unfriendly content to get the great firewall to kill the connection for you still worked, but in the days of TLS everywhere that doesn't seem to work anymore.
> The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans
I'm an unsure if this deadpan humor or if the author has never tried to solve a CAPTCHA that is something like "select the squares with an orthodox rabbi present"
I enjoyed the furor around the 2008 RapidShare catpcha lol
- https://www.htmlcenter.com/blog/now-thats-an-annoying-captch...
- https://depressedprogrammer.wordpress.com/2008/04/20/worst-c...
- https://medium.com/xato-security/a-captcha-nightmare-f6176fa...
The problem with that CAPTCHA is you're not allowed to solve it on Saturdays.
> an American fire hydrant or school bus
So much this. The first time one asked me to click on "crosswalks", I genuinely had to think for a while as I struggled to remember WTF a "crosswalk" was in AmEng. I am a native English speaker, writer, editor and professionally qualified teacher, but my form of English does not have the word "crosswalk" or any word that is a synonym for it. (It has phrases instead.)
Our schoolbuses are ordinary buses with a special number on the front. They are no specific colour.
There are other examples which aren't coming immediately to mind, but it is vexing when the designer of a CAPTCHA isn't testing if I am human but if I am American.
Google demanding I flag yellow cars when asked to flag taxis is the silliest Americanism I've seen. At least the school bus has SCHOOL BUS written all over it and fire hydrants aren't exactly an American exclusive thing.
On some Russian and Asian site I ran into trouble signing up for a forum using translation software because the CAPTCHA requires me to enter characters I couldn't read or reproduce. It doesn't happen as often as the Google thing, but the problem certainly isn't restricted to American sites!
There are also services out that will solve any CAPTCHA for you at a very small cost to you. And an AI company will get steep discounts with the volumes of traffic they do.
There are some browser extensions for it too, like NopeCHA, it works 99% of the time and saves me the hassle of doing them.
Any site using CAPTCHA's today is really only hurting there real customers and low hanging fruit.
Of course this assumes they can't solve the capture themselves, with ai, which often they can.
Yes, but not at a rate that enables them to be a risk to your hosting bill. My understanding is that the goal here isn't to prevent crawlers, it's to prevent overly aggressive ones.
Superficial comment regarding the catgirl, I don't get why some people are so adamant and enthusiastic for others to see it, but if you like me find it distasteful and annoying, consider copying these uBlock rules: https://sdf.org/~pkal/src+etc/anubis-ublock.txt. Brings me joy to know what I am not seeing whenever I get stopped by this page :)
Can you clarify if you mean that you do no understand the reasons that people dislike these images, or do you find the very idea of disliking it hard to relate to?
I cannot claim that I understand it well, but my best guess is that these are images that represent a kind of culture that I have encountered both in real-life and online that I never felt comfortable around. It doesn't seem unreasonable that this uneasiness around people with identity-constituting interests in anime, Furries, MLP, medieval LARP, etc. transfers back onto their imagery. And to be clear, it is not like I inherently hate anime as a medium or the idea of anthropomorphism in art. There is some kind of social ineptitude around propagating these _kinds_ of interests that bugs me.
I cannot claim that I am satisfies with this explanation. I know that the dislike I feel for this is very similar to that I feel when visiting a hacker space where I don't know anyone. But I hope that I could at least give a feeling for why some people don't like seeing catgirls every time I open a repository and that it doesn't necessarily have anything to do with advocating for a "corporate soulless web".
Every time I see one of these I think it's a malicious redirect to some pervert-dwelling imageboard.
On that note, is kernel.org really using this for free and not the paid version without the anime? Linux Foundation really that desperate for cash after they gas up all the BMW's?
It's crazy (especially considering anime is more popular now than ever; netflix alone is making billions a year on anime) that people see a completely innocent little anime picture and immediately think "pervent-dwelling imageboard".
> people see a completely innocent little anime picture and immediately think "pervent-dwelling imageboard"
Think you can thank the furries for that.
Every furry I've happened to come across was very pervy in some way, and so that what immediately comes to mind when I see furry-like pictures like the one shown in the article.
YMMV
Out of interest, how many furries have you met? I've been to several fur meets, and have met approximately three furries who I would not want to know anymore for one reason or another
Admittedly just a handful. But I met them in entirely non-furry settings, for example as a user of a regular open source program I was a contributor to (which wasn't Rust based[1]).
None of them were very pervy at first, only after I got to know them.
[1]: https://www.reddit.com/r/rust/comments/vyelva/why_are_there_...
To be fair, that's the sort of place where I spend most of my free time.
they've seized the moment to move the anime cat girls off the Arch Linux desktop wallpapers and onto lore.kernel.org.
Even if the images aren’t the kind of sexualized (or downright pornographic) content this implies… having cutesy anime girls pop up when a user loads your site is, at best, wildly unprofessional. (Dare I say “cringe”?) For something as serious and legit as kernel.org to have this, I do think it’s frankly shocking and unacceptable.
never forget the Ponies CV of an ML guy https://www.huffingtonpost.co.uk/2013/09/03/my-little-pony-r...
Noted, I will now add anime girls to my website, so I'm not at risk of being misconstrued as "professional"
You'd think it's the opposite, look at Joseph Redmon's resume:
For me it's the flipside: It makes me think "Ahh, my people!"
Huh, why would they need the unbranded version? The branded version works just fine. It's usually easier to deploy ordinary open source software than it is for software that needs to be licensed, because you don't need special download pages or license keys.
If it makes sense for an organization to donate to a project they rely on, then they should just donate. No need to debrand if it's not strictly required, all that would do is give the upstream project less exposure. For design reasons maybe? But LKML isn't "designed" at all, it has always exposed the raw ugly interface of mailing list software.
Also, this brand does have trust. Sure, I'm annoyed by these PoW captcha pages, but I'm a lot more likely to enable Javascript if it's the Anubis character, than if it is debranded. If it is debranded, it could be any of the privacy-invasive captcha vendors, but if it's Anubis, I know exactly what code is going to run.
If i saw an anime pic show up, thatd be a flag. I only know of Anubis’ existence and use of anime from hn.
It is only trusted by a small subset of people who are in the know. It is not about “anime bad” but that a large chunk of the population isnt into it for whatever reason.
I love anime but it can also be cringe. I find this cringe as it seems many others do too.
> Anubis is a clone of Kiwiflare, not an original work, so you're actually sort of half-right:
Interesting. That itself appears to be a clone of haproxy-protection. I know there has also been an nginx module that does the same for some time. Either way, proof-of-work is by this point not novel.
Everyone seems to have overlooked the more substantive point of my comment which is that it appears kernel.org cheaped out and is using the free version of Anubis, instead of paying up to support the developer for his work. You know they have the money to do it.
In 2024 the Linux Foundation reported $299.7M in expenses, with $22.7M of that going toward project infrastructure and $15.2M on "event services" (I guess making sure the cotton candy machines and sno-cone makers were working at conferences).
My point is, cough up a few bucks for a license you chiselers.
> My point is, cough up a few bucks for a license you chiselers.
You mean this one? https://github.com/TecharoHQ/anubis/blob/main/LICENSE
No I mean this one:
> Everyone seems to have overlooked the more substantive point of my comment which is that it appears kernel.org cheaped out and is using the free version of Anubis, instead of paying up to support the developer for his work. You know they have the money to do it. > > In 2024 the Linux Foundation reported $299.7M in expenses, with $22.7M of that going toward project infrastructure and $15.2M on "event services" (I guess making sure the cotton candy machines and sno-cone makers were working at conferences). > > My point is, cough up a few bucks for a license you chiselers.
Several points:
- there is no license to pay. This is free (as in open source and as in beer) software. There is commercial support if you feel you need it and sponsoring options however. Sponsoring is not paying a license.
- Sometimes it takes so long to get approval for a sponsor that large org member give up.
- Obviously kernel.org is using an old release of anubis so they likely observed a huge spike in bandwith used at some point and used anubis, solving the problem immediately. I don't remember anubis proposing a paid license at the time of the early releases. I may be wrong but it may be that kernel.org admins have never heard of the possibly of sponsoring nor are they interested in support.
- you don't have to pay anythinf to change/remove the image and the people who implemented this clearly do not care as they didn't do it.
- do we have evidence that the anubis developer ever donated directly or indirectly to Linus Torvalds and the thousands of developers who worked on the kernel?
Anubis has nothing to do with Kiwiflare, there's no connection at all. It's not the same codebase, and the inspiration for Anubis comes from Hashcash (1997) and numerous other examples of web PoW that predate Kiwiflare, which perhaps tens of thousands of websites were already using as an established technique. What makes you think it is a clone of it?
I wonder if the best solution is still just to create link mazes with garbage text like this: https://blog.cloudflare.com/ai-labyrinth/
It won't stop the crawlers immediately, but it might lead to an overhyped and underwhelming LLM release from a big name company, and force them to reassess their crawling strategy going forward?
The contents in question are statically generated, 1-3 KB HTML files. Hosting a single image would be the equivalent of cold serving 100s of requests.
Putting up a scraper shield seems like it's more of a political statement than a solution to a real technical problem. It's also antithetical to open collaboration and an open internet of which Linux is a product.
A great option for most people, and indeed Anubis' README recommends using Cloudflare if possible. However, not everyone can use a paid CDN. Some people can't pay because their payment methods aren't accepted. Some people need to serve content or to countries which a major CDN can't for legal and compliance reasons. Some organizations need their own independent infrastructure to serve their organizational misson.
I have a S24 (flagship of 2024) and Anubis often takes 10-20 seconds to complete, that time is going to add up if more and more sites adopt it, leaning to a worse browsing experience and wasted battery life.
Meanwhile AI farms will just run their own nuclear reactors eventually and be unaffected.
I really don't understand why someone thought this was a good idea, even if well intentioned.
Something must be wrong on your flagship smartphone because I have an entry level one that doesn't take that long.
It seems there is a large number of operations crawling the web to build models that aren't using directly infrastructure hosted on AI farms BUT botnet running on commodity hardware and residencial networks to circumvent their ip range from being blacklisted. Anubis point is to block those.
Which browser and which difficulty setting is that?
Because I've got the same model line but about 3 or 4 years older and it usually just flashes by in the browser Lightning from F-droid which is an OS webview wrapper. On occasion a second or maybe two, I assume that's either bad luck in finding a solution or a site with a higher difficulty setting. Not sure if I've seen it in Fennec (firefox mobile) yet but, if so, it's the same there
I've been surprised that this low threshold stops bots but I'm reading in this thread that it's rather that bot operators mostly just haven't bothered implementing the necessary features yet. It's going to get worse... We've not even won the battle let alone the war. Idk if this is going to be sustainable, we'll see where the web ends up...
Either your phone is on some extreme power saving mode, your ad blocker is breaking Javascript, or something is wrong with your phone.
I've certainly seen Anubis take a few seconds (three or four maybe) but that was on a very old phone that barely loaded any website more complex than HN.
I remember that LiteCoin briefly had this idea, to be easy on consumer hardware but hard on GPUs. The ASICs didn't take long to obliterate the idea though.
Maybe there's going to be some form of pay per browse system? even if it's some negligible cost on the order of 1$ per month (and packaged with other costs), I think economies of scale would allow servers to perform a lifetime of S24 captchas in a couple of seconds.
Samsung's UI has this feature where it turns on power saving mode when it detects light use.
I guess his flagship IS compromised and part of an AI crawling botnet ;-)
I disagree with the post author in their premise that things like Anubis are easy to bypass if you craft your bot well enough and throw the compute at it.
Thing is, the actual lived experience of webmasters tells that the bots that scrape the internets for LLMs are nothing like crafted software. They are more like your neighborhood shit-for-brain meth junkies competing with one another who makes more robberies in a day, no matter the profit.
Those bots are extremely stupid. They are worse than script kiddies’ exploit searching software. They keep banging the pages without regard to how often, if ever, they change. If they were 1/10th like many scraping companies’ software, they wouldn’t be a problem in the first place.
Since these bots are so dumb, anything that is going to slow them down or stop them in their tracks is a good thing. Short of drone strikes on data centers or accidents involving owners of those companies that provide networks of botware and residential proxies for LLM companies, it seems fairly effective, doesn’t it?
TFA — and most comments here — seem to completely miss what I thought was the main point of Anubis: it counters the crawler's "identity scattering"/sybil'ing/parallel crawling.
Any access will fall into either of the following categories:
- client with JS and cookies. In this case the server now has an identity to apply rate limiting to, from the cookie. Humans should never hit it, but crawlers will be slowed down immensely or ejected. Of course the identity can be rotated — at the cost of solving the puzzle again.
- amnesiac (no cookies) clients with JS. Each access is now expensive.
(- no JS - no access.)
The point is to prevent parallel crawling and overloading the server. Crawlers can still start an arbitrary number of parallel crawls, but each one costs to start and needs to stay below some rate limit. Previously, the server would collapse under thousands of crawler requests per second. That is what Anubis is making prohibitively expensive.