Comment by notpushkin

Comment by notpushkin 2 days ago

15 replies

My favourite thing about Anubis is that (in default configuration) it completely bypasses the actual challenge altogether if you set User-Agent header to curl.

E.g. if you open this in browser, you’ll get the challenge: https://code.ffmpeg.org/FFmpeg/FFmpeg/commit/13ce36fef98a3f4...

But if you run this, you get the page content straight away:

  curl https://code.ffmpeg.org/FFmpeg/FFmpeg/commit/13ce36fef98a3f4e6d8360c24d6b8434cbb8869b
I’m pretty sure this gets abused by AI scrapers a lot. If you’re running Anubis, take a moment to configure it properly, or better put together something that’s less annoying for your visitors like the OP.
rezonant 2 days ago

It only challenges user agents with Mozilla in their name by design, because user agents that do otherwise are already identifiable. If Anubis makes the bots change their user agents, it has done its job, as that traffic can now be addressed directly.

xena 2 days ago

This was a tactical decision I made in order to avoid breaking well-behaved automation that properly identifies itself. I have been mocked endlessly for it. There is no winning.

  • seba_dos1 2 days ago

    The winning condition does not need to consider people who write before they think.

  • ranger_danger a day ago

    How is a curl user-agent automatically a well-behaved automation?

    • fragmede a day ago

      One assumes it is a human, running curl manually, from the command line on a system they're authorized to use. It's not wget -r.

      • ranger_danger 14 hours ago

        Sounds like the perfect opportunity for bots to use the curl user-agent. How do we know they're not already doing this?

        • fragmede 14 hours ago

          We don’t but now that we’ve talked about it publicly on the Internet they’re gonna start doing that. I'm sure they previously were, but now we've gone and told them, uh yeah.

seba_dos1 2 days ago

> I’m pretty sure this gets abused by AI scrapers a lot.

In practice, it hasn't been an issue for many months now, so I'm not sure why you're so sure. Disabling Anubis takes servers down; allowing curl bypass does not. What makes you assume that aggressive scrapers that don't want to identify themselves as bots will willingly identify themselves as bots in the first place?