Comment by supriyo-biswas
Comment by supriyo-biswas 18 hours ago
I run a semi-popular website hosting user-generated content, although it's not a search engine; the attacks on it have surprised me, and I've eventually had to put in the same kinds of restrictions on it.
I was initially very hesitant to restrict any kind of traffic, relying on ratelimiting IPs on critical endpoints that needed low friction, and captchas on the higher friction with higher intents, such as signup and password reset pages.
Other than that, I was very liberal with most traffic, making sure that Tor was unblocked, and even ending up migrating off Cloudflare's free tier to a paid CDN due to inexplicable errors that users were facing over Tor that were ultimately related to how they blocked some specific requests over Tor with 403, even though the MVPs on their community forums would never acknowledge such a thing.
Unfortunately, given that Tor is a free rotating proxy, my website got attacked on one of these critical, compute heavy endpoints through multiple exit nodes totaling ~20,000 RPS. I've reluctantly had to block Tor, and a few other paid proxy services discovered through my own research since then.
Another time, a set of human spammers distributed all over the world started sending out a large volume of spam towards my website; with something like 1,000,000 spam messages every day (I still feel this was an attack coordinated by a "competitor" of some sort, especially given a small percentage of messages entitled "I want to get paid for posting" or along those lines).
There was no meaningful differentiator between the spammers and legitimate users, they were using real Gmail accounts to sign up, analysis of their behaviours showed they were real users as opposed to simple or even browser-based automation, and the spammers were based out of the same residential IPs as legitimate users.
I, again, had to reluctantly introduce a spam filter on some common keywords, and although some legitimate users do get trapped from time to time, this was the only way I could get a handle on that problem.
I'm appalled by some of the discussions here. Was I "enshittifying" my website out of unbridled "greed"? I don't think so. But every time I come here, I find these accusations, which makes me think that as a website with technical users, we can definitely do better.
The problem is accountability. Imagine starting a trade show business in the physical world as an example.
One day you start getting a bunch of people come in to mess with the place. You can identify them and their organization, then promptly remove them. If they continue, there are legal ramifications.
On the web, these people can be robots that look just like real people until you spend a while studying their behavior. Worse if they’re real people being paid for sabotage.
In the real world, you arrest them and find the source. Online they can remain anonymous and protected. What recourse do we have beyond splitting the web into a “verified ID” web, and a pseudonymous analog? We can’t keep treating potential computer engagement the same as human forever. As AI agents inevitably get cheaper and harder to detect, what choice will we have?