Comment by zzzeek

Comment by zzzeek 8 hours ago

7 replies

I just had to purchase a cloudflare account to protect two of my sites used for CI that run Jenkins and Gerrit servers. These are resource-hungry java VMs which I have running on a minimally powered server as they are intended to be accessed only by a few people, yet crawlers located in eastern Europe and Asia eventually found it and would regularly drive my CPU up to 500% and make the server unavailable (it should go without saying I have always had a robots.txt on these sites that prohibit all crawling. Such files are a quaint relic of a simpler time). For a couple of years I'd block the various offending IPs, but this past month the crawling resumed again this time intentionally swarmed across hundreds of IP numbers so that I could not easily block them. Cloudflare was able to show me within minutes the entirety of the IP numbers came from a single ASN owned by a very large and well known Chinese company and I blocked the entire ASN. While I could figure out these ASNs manually and get blocklists to add to apache config, Cloudflare makes it super easy showing you the whole thing happening in realtime. You can even tailor the 403 response to send them a custom message, in my case, "ALL of the data you are crawling is on github! Get off these servers and go get it there!" (again sure I could write out httpd config for all of that but who wants to bother). They are definitely providing a really critical service.

SoftTalker 6 hours ago

> intended to be accessed only by a few people

So why are they open to the entire world?

  • zzzeek 6 hours ago

    open to people who contribute PRs so they can see why their tests failed, also htdigest / htpasswd access is complicated / impossible (depending on use case) to configure with the way jenkins / gerrit authentication itself works, particularly with internal scripts and hooks that need to communicate with them.

cm2187 8 hours ago

Particularly if your users are keen on solving recaptchas over and over.

  • cobbzilla 7 hours ago

    How many users do you think are on the poster’s Jenkins/CI system? Sounded like a personal thing or maybe a small team, I didn’t get the impression it was supposed to be public.

    • cm2187 7 hours ago

      The poster ends with a general comment on the usefulness of cloudflare.

    • zzzeek 3 hours ago

      It's an open source project. It's public .

  • zzzeek 3 hours ago

    I don't even have the captchas turned on. When I get an email that cpu is churning for three hours , cloudflare gives me a quick way to see where the traffic is coming from and I can just block it. Because it's always crawlers, which is the point of this discussion "are there actually crawlers? Yes "