Comment by 0x1ceb00da
Comment by 0x1ceb00da a day ago
What kind of issues do you usually face?
Comment by 0x1ceb00da a day ago
What kind of issues do you usually face?
AKA the real world, a place where you have older appliances, legacy servers, contractual constraints and better things to do than watch a nasty yearly ritual become a nasty monthly ritual. I need to make sure SSL is working in a bunch of very heterogeneous stuff but not in a position to replace it and/or pick an authority with better automation. I just suck it up and dread when a "cert day" looms closer.
Sometimes these kind of decisions seem to come from bodies that think the Internet exists solely for doing the thing they do.
Happens to me with the QA people at our org. They behave as if anything happens just for the purpose of having them measure it, creating a Heisenberg situation where their incessant narrow-minded meddling makes actually doing anything nearly imposible.
The same happens with manual processes done once a year - you just aren't aware of it until renewal.
Consider the inevitable need for immediate renewal due to an incident. Would you rather have this renewal happen via a fast, automated and well-tested process, or a silently broken slow and manual one?
The manual process was annoying but it wasn't complicated.
You knew exactly when it was going to fail and you could put it on your calendar to schedule the work, which consisted of an email validation process and running a command to issue the certificate request from your generated key.
The only moving part was the issued certificate, which you copied and pasted over and reloaded the server. There are a lot less things to go wrong on this process, which at one point I could do once every two years, than in a really complicated automated background task that has to happen within 15 days.
I love short duration automated free certs, but I think we really need to have a conversation about how short we can make them before we make it so humans no longer have the time required to fix problems anymore.
There are also alternatives to Cloudflare and AWS, that didn't stop their outages from taking down pretty much the entire internet. I'm not sure what your point is, pretty much everybody is using let's encrypt and it will very much be a huge outage event for the web if something were to go seriously wrong with it.
One key difference: A cert is a “pickled” thing, it’s stored and kept until it is successfully renewed. So if you attempt to renew at day 30 and LE is down, then you still have nearly more than two weeks to retrieve a new cert. Hopefully LE will get on their feet again within that time. Otherwise you have Google, ZeroSSL, etc where you can fetch a replacement cert.
Without getting into specific stuff I've run into, automated stuff just, breaks.
This is a living organism with moving parts and a time limit - you update nginx with a change that breaks .well-known by accident, or upgrade to a new version of Ubuntu and suddenly some dependency isn't loading correctly, or that UUID generator you depended on to generate the name for the challenge doesn't get loaded, or certbot becomes obsolete because of some API change and you can't upgrade to the latest because the OS is older and you installed it from the package manager.
You eventually see it in your exception monitoring or when an ssl monitor detects the cert is about to expire. Then you have to drop that other urgent thing you needed to get done, come in and debug it, fix it, and re-issue all the certs at the rate limit allowed. That's assuming you have that monitoring - most sites probably don't.
If you detect that issue with 1/3 of the cert left, you will now have 15 days to figure that out instead of 30. If you can't finish it in time, or you don't learn about it in time, the site(s) hard fail on every web browser that visits and you've effectively got a full site outage until you repair it.
So you discover it's because of certbot not working with a new API change, and you can't upgrade with the package manager. Now you need to figure out how to compile it from source, but it doesn't like the python that is currently installed and now you need to install that from source, but that version of python breaks your python web app so you have to figure out how to migrate your app to that version of python before you can do that, and the programmer that can do that is on a week long whitewater rafting trip in Idaho.
Aside from all that, what happens if a hacker manages to wreck the let's encrypt infra so badly they need 2 weeks to get it back online? The internet archive was offline for weeks after a ddos attack. The cloudflare outage took one site of mine down for less than 10 minutes, it's not hard to imagine a much worse outage for the web here.