Comment by simonw

Comment by simonw 5 days ago

70 replies

S3: "Block Public Access is now enabled by default on new buckets."

On the one hand, this is obviously the right decision. The number of giant data breeches caused by incorrectly configured S3 buckets is enormous.

But... every year or so I find myself wanting to create an S3 bucket with public read access to I can serve files out of it. And every time I need to do that I find something has changed and my old recipe doesn't work any more and I have to figure it out again from scratch!

sylens 5 days ago

The thing to keep in mind with the "Block Public Access" setting is that is a redundancy built in to save people from making really big mistakes.

Even if you have a terrible and permissive bucket policy or ACLs (legacy but still around) configured for the S3 bucket, if you have Block Public Access turned on - it won't matter. It still won't allow public access to the objects within.

If you turn it off but you have a well scoped and ironclad bucket policy - you're still good! The bucket policy will dictate who, if anyone, has access. Of course, you have to make sure nobody inadvertantly modifies that bucket policy over time, or adds an IAM role with access, or modifies the trust policy for an existing IAM role that has access, and so on.

  • simonw 5 days ago

    I think this is the key of why I find it confusing: I need a very clear diagram showing which rules override which other rules.

    • saghm 5 days ago

      My understanding is that there isn't actually any "overriding" in the sense of two rules conflicting and one of them having to "win" and take effect. I think it's more that an enabled rule always is in effect, but it might overlap with another rule, in which case removing one of them still won't remove the restrictions on the area of overlap. It's possible I'm reading too much into your choice of words, but it does sound like there's a chance that the confusion is stemming from an incorrect assumption of how various permissions interact.

      That being said, there's certain a lot more that could into making a system like that easier for developers. One thing that springs to mind is tooling that can describe what rules are currently in effect that limit (or grant, depending on the model) permissions for something. That would make it more clear when there are overlapping rules that affect the permissions of something, which in turn would make it much more clear why something is still not accessible from a given context despite one of the rules being removed.

      • jagged-chisel 4 days ago

        If one rule explicitly restricts access and another explicitly grants access, which one is in effect? Do restrictions override grants? Does a grant to GroupOne override a restriction to GroupAlpha when the authenticated use in is both groups? Do rules set by GodAdmin override rules set by AngelAdmin?

        • saghm 4 days ago

          It's possible I'm making the exact mistake that the article describes and relying on outdated information, but my understanding is that pretty much all of the rules are actually permissions rather than restrictions. "Block public access" is an unfortunate exception to this, and I suspect that it's probably just a poorly named inversion of an "allow public access" permission. You're 100% right that modeling permissions like this requires having everything in the same "direction", i.e. either all permissions or all restrictions.

          After thinking about this sort of thing a lot when designing a system for something sort of similar to this (at a much smaller scale, but with the intent to define it in a way that could be extended to define new types of rules for a given set of resources), I feel pretty strongly that the best way for a system like this to work from the protectives of security, ease of implementation, and intuitiveness for users are all aligned in requiring every rule to explicitly be defined as a permission rather than representing any of them as restrictions (both in how they're presented to the user and how they're modeled under the hood). With this model, veryifing whether an action is allowed can be implemented by mapping an action to the set of accesses (or mutations, as the case may be) it would perform, and then checking that each of them has a rule present that allows it. This makes it much easier to figure out whether something is allowed or not, and there's plenty of room for quality of life things to help users understand the system (e.g. being able to easily show a user what rules pertain to a given resource with essentially the same lookup that you'd need to do when verifying an action in it). My sense is that this is actually not far from how AWS permissions are implemented under the hood, but they completely fail at the user-facing side of this by making it much harder than it needs to be to discover where to define the rules for something (and by extension, where to find the rules currently in effect for it).

    • luluthefirst 4 days ago

      They don't really override each other but they act like stacked barriers, like a garage door blocking access to an open or closed car. Access is granted if every relevant layer allows it.

andrewmcwatters 5 days ago

This sort of thing drives me nuts in interviews, when people are like, are you familiar with such-and-such technology?

Yeah, what month?

  • tester756 5 days ago

    If you're aware of changes, then explain that there were changes over time, that's it

    • andrewmcwatters 5 days ago

      You seem to be lacking the experience of what actually happens in interviews.

    • reactordev 5 days ago

      You say this, someone challenges you, now you're on the defensive during an interview and everyone has a bad taste in their mouth. Yeah, that's how it goes.

      • pas 4 days ago

        That's just the taste of iron from the blood after the duel. But this is completely normal after a formal challenge! Companies want real cyberwarriors, and the old (lame) rockstar ninjas that they hired 10 years ago are very prone to issuing these.

    • [removed] 5 days ago
      [deleted]
crinkly 5 days ago

I just stick CloudFront in front of those buckets. You don't need to expose the bucket at all then and can point it at a canonical hostname in your DNS.

  • hnlmorg 5 days ago

    That’s definitely the “correct” way of doing things if you’re writing infra professionally. But I do also get that more casual users might prefer not to incur the additional costs nor complexity of having CloudFront in front. Though at that point, one could reasonably ask if S3 is the right choice for causal users.

    • gchamonlive 5 days ago

      S3 + cloudfront is also incredibly popular so you can just find recipes for automating that in any technology you want, Terraform, ansible, plain bash scripts, Cloudformation (god forbid)

      • gigatexal 5 days ago

        Yeah holy crap why is cloud formation so terrible?

    • damieng 5 days ago

      I'd argue putting CloudFront on top of S3 is less complex than getting the permissions and static sharing setup right on S3 itself.

      • hnlmorg 4 days ago

        I do get where you're coming from, but I don't agree. With the CF+S3 combo you now need to choose which sharing mode to work with S3 (there are several different ways you can link CF to S3). Then you have the wider configuration of CF to manage too. And that's before you account for any caching issues you might run into when debugging your site.

        If you know what you're doing, as it sounds like you and I do, then all of this is very easy to get set up (but then aren't most things easy when you already know how? hehe). However we are talking about people who aren't comfortable with vanilla S3, so throwing another service into the mix isn't going to make things easier for them.

    • crinkly 5 days ago

      It's actually incredibly cheap. I think our software distribution costs, in the account I run, are around $2.00 a month. That's pushing out several thousand MSI packages a day.

      • hnlmorg 4 days ago

        S3 is actually quite expensive compared to the competition for both storage costs and egress costs. At a previous start-up, we had terrabytes of data on S3 and it was our second largest cost (after GPUs) and by some margin.

        For small scale stuff, S3s storage and egress charges are unlikely to be impactful. But it doesn’t mean they’re cheap relative to the competition.

        There are also ways you can reduce S3 costs, but then you're trading the costs received from AWS with the costs of hiring competent DevOps. Either way, you pay.

    • tayo42 5 days ago

      >S3 is the right choice for causal users.

      It's so simple for storing and serving a static website.

      Are there good and cheap alternatives?

      • MaKey 5 days ago

        Yeah, your classic web hoster. Just today I uploaded a static website to one via FTP.

  • herpderperator 5 days ago

    For the sake of understanding, can you explain why putting CloudFront in front of the buckets helps?

    • bhattisatish 5 days ago

      Cloudfront allows you to map your S3 with both

      - signed url's in case you want a session base files download

      - default public files, for e.g. a static site.

      You can also map a domain (sub-domain) to Cloudfront with a CNAME record and serve the files via your own domain.

      Cloudfront distributions are also CDN based. This way you serve files local to the users location, thus increasing the speed of your site.

      For lower to mid range traffic, cloudfront with s3 is cheaper as the network cost of cloudfront is cheaper. But for large network traffic, cloudfront cost can balloon very fast. But in those scenarios S3 costs are prohibitive too!

  • dcminter 4 days ago

    Not always that simple - for example if you want to automatically load /foo/index.html when the browser requests /foo/ you'll need to either use the web serving feature of S3 (bucket can't be private) or set up some lambda at edge or similar fiddly shenanigans.

cedws 4 days ago

I’m getting deja vu, didn’t they already do this like 10 years ago because people kept leaving their buckets wide open?

awongh 5 days ago

This is exactly what I use LLMs for. To just read the docs for me and pull out the base level demo code that's buried in all the AWS documentation.

Once I have that I can also ask it for the custom tweaks I need.

  • jiggawatts 4 days ago

    Back when GPT4 was the new hotness, I dumped the markdown text from the Azure documentation GitHub repo into a vector index and wrapped a chatbot around it. That way, I got answers based on the latest documentation instead of a year-old LLM model's fuzzy memory.

    I now have the daunting challenge of deploying an Azure Kubernetes cluster with... shudder... Windows Server containers on top. There's a mile-long list of deprecations and missing features that were fixed just "last week" (or whatever). That is just too much work to keep up with for mere humans.

    I'm thinking of doing the same kind of customised chatbot but with a scheduled daily script that pulls the latest doco commits, and the Azure blogs, and the open GitHub issue tickets in the relevant projects and dumps all of that directly into the chat context.

    I'm going to roll up my sleeves next week and actually do that.

    Then, then, I'm going to ask the wizard in the machine how to make this madness work.

    Pray for me.

    • elcritch 4 days ago

      I just want a service that does this. Pulls in the latest docs into a vector db with a chat or front-end. Not the windows containers bit.

  • dcminter 4 days ago

    This could not possibly go wrong...

    You're braver than me if you're willing to trust the LLM here - fine if you're ready to properly review all the relevant docs once you have code in hand, but there are some very expensive risks otherwise.

    • awongh 4 days ago

      This is LLM as semantic search- so it's way way easier to start from the basic example code and google to confirm that it's correct than it is to read the docs from scratch and piece together the basic example code. Especially for things like configurations and permissions.

      • dcminter 4 days ago

        Sure, if you do that second part of verifying it. If you just get the LLM to spit it out then yolo it into production it is going to make you sad at some point.

    • simianwords 4 days ago

      There’s nothing brave in this. It generally works the way it should and even if it doesn’t - you just go back to see what went wrong.

      I take code from stack overflow all the time and there’s like a 90% chance it can work. What’s the difference here?

      • jcattle 4 days ago

        However on AWS the difference between "generally working the way it should and not working the way it should" can be a 30,000$ cloud bill racked up in a few hours with EC2 going full speed ahead mining bitcoin.

        • simianwords 4 days ago

          For those high stakes cases maybe you can be more careful. You can still use an LLM to search and get references to the appropriate place and do your own verification.

          But for low stakes LLM works just fine - not everything is going to blow up to a 30,000 bill.

          In fact I'll take the complete opposite stance - verifying your design with an LLM will help you _save_ money more often than not. It knows things you don't and has awareness of concepts that you might have not even read about.

      • dcminter 4 days ago

        Well, the "accidentally making the S3 bucket public" scenario would be a good one. If you review carefully with full understanding of what e.g. all your policies are doing then great, no problem.

        If you don't do that will you necessarily notice that you accidentally leaked customer data to the world?

        The problem isn't the LLM it's assuming its output is correct just the same as assuming Stack Overflow answers are correct without verifying/understanding them.

reactordev 5 days ago

They'll teach you how for $250 and a certification test...

SOLAR_FIELDS 5 days ago

I honestly don't mind that you have to jump through hurdles to make your bucket publically available and that it's annoying. That to me seems like a feature, not a bug

  • dghlsakjg 5 days ago

    I think the OPs objection is not that hurdles exist but that they move them every time you try and run the track.

  • simonw 5 days ago

    Sure... but last time I needed to jump through those hurdles I lost nearly an hour to them!

    I'm still not sure I know how to do it if I need to again.