Comment by Jubijub
I’m sorry but you comment shows you never had to fight this problem a scale. The challenge is not small time crawlers, the challenge is blocking large / dedicated actors. The problem is simple : if there is more than X volume of traffic per <aggregation criteria >, block it. Problem : most aggregation criteria are trivially spoofable, or very cheap to change : - IP : with IPv6 this is not an issue to rotate your IP often - UA : changing this is scraping 101 - SSL fingerprint : easy to use the same as everyone - IP stack fingerprint : also easy to use a common one - request / session tokens : it’s cheap to create a new session You can force login, but then you have a spam account creation challenge, with the same issues as above, and depending on your infra this can become heavy
Add to this that the minute you use a signal for detection, you “burn” it as adversaries will avoid using it, and you lose measurement thus the ability to know if you are fixing the problem at all.
I worked on this kind of problem for a FAANG service, whoever claims it’s easy clearly never had to deal with motivated adversaries
Should be easy enough to create a DroneBL for residential proxy services. Since you work on residential proxy detection at a FAANG service, why haven't you done it yet?
If they're doing things the above-board way from their own ASN, block their ASN.
If they're doing things the above-board way from third-party hosting providers, send abuse reports. Late last year there was a commotion because someone was sending single spoofed SSH SYN packets, from the addresses of Tor nodes, to organizations with extremely sensitive security policies. Many people with Tor nodes got threats of being banned from their hosting provider, over a single packet they didn't even send. They're definitely going to ban people who are doing actual DDoSes from their servers.
DDoS is also a federal crime, so if you and they are in the USA, you might consider trying to get them put in prison.