Comment by neofrommatrix

Comment by neofrommatrix 4 days ago

95 replies

It's not worth working there as a L5/L6 level engineer. The money is absolutely not worth it. Unless, your team is working on an absolutely new product. The only engineers,IMO, that like it there are those adept at finding new bootstrapped teams and designing and writing the product from scratch and releasing the MVP. They then hand over the crappy MVP to other engineers to support and move on to other new products. On-call is absolutely brutal because of exactly that.

hughesjj 4 days ago

Worked there for 7 years (left in 2021) and this is an accurate summary of my experience there.

Adding on thoughts:

One of my biggest gripes was that "make a good marketing opportunity at Re:Invent" seemed to become more important than "release beloved software that makes the lives of our customers easier" by the time I left (not that I was working on anything for reinvent in my final years there).

I will add that I learned a TON from AWS, and got to practice much of it too. It's the best boot camp one could ask for regarding general skill development imo (not particular frameworks etc but like, the theory and practice). There's also some things I miss like the weekly ops review and the general engineering culture, especially when it came to explicitly listing service limits, API specs, and cost up front in your design. Oh, and I honestly miss the docs culture. Quip wasn't as good as Google docs but the actual docs themselves and process of authoring them were SUPER valuable.

Coding wise, CDK was so much better than terraform (once we moved to CDK from lpt+cfn, which was way worse imo). Smithy and open API are neato too (@smithy externally everyone uses thrift it seems, but the overlap of functionality/use cases isn't identical).

Probably the biggest thing I miss was bones (kind of successor to octane), which is kind of like yeoman or create react app but would include so so much of the excellent internal tooling of ci/CD approval actions. I don't know of a real external equivalent, but would love to have one. If you ever see a Breland Miley or Ian Mosher apply to your company, HIRE THEM IMMEDIATELY. (There was another really solid guy on that team but their name escapes me at the moment, and here's hoping I got the spelling right)

Oh, also isengard is still easier to use than okta or AWS organizations to manage accounts imo.

  • hughesjj 4 days ago

    Commenting to myself:

    This looks interesting and relevant:

    - https://github.com/projen/projen

    - https://aws.amazon.com/blogs/devops/getting-started-with-pro...

    - https://projen.io/

    Looks Amazon official. Okay, I'm hype, this will be fun to play with.

    • getpokedagain 4 days ago

      We use projen where I work for the past year or so for new projects. It’s pretty good and the devs are pretty active in terms of responding to bugs and not being shit at documentation.

  • bobnamob 3 days ago

    Ian finished last week :(

    Pipelines, BT and Isengard are absolutely what I'll miss the most as well (I handed in my resignation notice last week, prior to all this RTO2.0 kerfuffle)

  • darby_nine 4 days ago

    > One of my biggest gripes was that "make a good marketing opportunity at Re:Invent" seemed to become more important than "release beloved software that makes the lives of our customers easier" by the time I left (not that I was working on anything for reinvent in my final years there).

    Was this something you knew was coming or did this behavior surprise you? I realize enshittification really ramped up over the 2010s but I have a hard time last remembering when I expected a company to aim for customer satisfaction over squeezing more revenue. Maybe tiktok? (Which has since enshittified in many ways.)

    The rest hurts a lot, though. It's not fun to watch the culture of a company you once had pride in sour and rot.

    • hughesjj 3 days ago

      I might have just drunken too much of the koolade and believed in the mythos of lowflyinghawk + customer obsession.

      What I meant by this is, in my personal opinion, there were a bunch of half baked products they should have just not mentioned at reinvent because said products never really materialized or had significant usage oncerns for a long, long time after the announcement.

      The pressure to announce more and more at reinvent while the quality of what was being announced dropped was the specific feeling I'm talking about.

      Sorry kind of on a caffeine high and brain isn't working too well right now. I'm also reluctant to throw shade on the products/teams I'm thinking of because I didn't work on them and I don't want to give them any heat, but I'll say it was in the 2017-2019 era I felt it start to change.

      I think it contrasts with the really cool launches like Lambda, Aurora, API Gateway, Sagemaker, etc that has just come out before then.

      • darby_nine 3 days ago

        > The pressure to announce more and more at reinvent while the quality of what was being announced dropped was the specific feeling I'm talking about.

        Yea, I can certainly see this making one feel claustrophobic.

  • trallnag 4 days ago

    When you talk about docs at AWS, do you mean internal documentation or the public one?

    • strivingtobe 4 days ago

      Neither, they're talking about the culture of writing documents as a form of sharing ideas. Where other companies might use powerpoint presentations or unstructured meetings to brainstorm on ideas, Amazon instead encourages people to write a document summarizing their thoughts, and then there is a meeting where people silently read and comment on the document, and then afterwards discuss it.

      • JonChesterfield 4 days ago

        That's an extremely sensible idea in multiple dimensions. It prioritises clarity of thought over rambling discussion in conference calls. I wonder if there's a feasible path to gradually steer an existing organisational structure in that direction.

      • hughesjj 4 days ago

        ^ exactly, thanks for taking the answer perfectly.

        It's basically panel 2 from this:

        https://xkcd.com/568/

        Beyond the initial publication of the doc, the peer review process is much more sane than trying to review a bunch of power point slides. Similarly, it's much much easier to refer to a well written document when it comes time to implement or reevaluate an idea than going over some power point slides and maybe an associated recording, to say nothing about searchability, discoverability, and maintainability of an actual written document vs PowerPoint slides.

        Also, idle side speculation: I wonder how much (if any) one of the underappreciated early employees @ Amazon had a hand in proselytizing this, given she (MacKenzie) is an author.

      • nullorempty 3 days ago

        In my experience these documents were actually never good. I’ve never seen anyone ask for estimates either. Surely it’s better than some other companies but if it were good they wouldn’t need this absolutely horrific oncall

      • dbtablesorrows 3 days ago

        I believe most tech focused companies do it and it's called design docs / RFCs.

jp57 4 days ago

My experience there (15 years ago) was that on-call was terrible because line management was unable or unwilling to invest in fixing root causes of operational issues.

When I started I lucked into a situation where I was one engineer a "team" of two. We didn't have a manager and were reporting to the director of our department. He only had about an hour a week to meet with us. We spent a lot of time fixing broken stuff that we'd inherited (a task that I actually found kind of fun), and soon our ops load started going down. We eventually got another engineer and a manager who was willing to prioritize fixing the root causes of our on-call tickets.

During black-friday-week of my second year there we had essentially no operational issues and spent our time brainstorming future work while we kept an eye our performance dashboards. We got semi-scolded by a senior engineer from a neighboring team because we didn't "seem very busy". Our manager called that a win.

Even back then Amazon had the reputation for being a brutal place to work and for burning out engineers, but I rather liked it. I ultimately left because my wife hated living in Seattle.

  • hypeatei 4 days ago

    > We got semi-scolded by a senior engineer from a neighboring team because we didn't "seem very busy"

    What the hell? Hope you told him off, not his job or his business. Weird.

    • jp57 4 days ago

      Well, he rolled up to the same director and was the most senior engineer under that director, so it was a little bit his business.

      And, like I said it was only semi scolded: he came to me and quietly said something like, "you guys don't seem very busy", which I took to mean "why are you loudly brainstorming future work when the guys in the next row of cubes over haven't slept in 36 hours?"

      My answer was, "All our stuff is working."

      • ethbr1 4 days ago

        I heard it expressed years ago, when the role was still call sysops, that it was the one job where the better you were then the less work you did.

        It was attached to a similar anecdote about someone being yelled at for crafting a well-oiled system.

    • goostavos 4 days ago

      Stuff like that can almost always be traced back to that senior being told to "be visible." Show up! Have opinion on things (loudly)! "Scale yourself!" Other mumbo jumbo. It often leads to these weird misguided drive-bys where everyone is left confused.

      • kevinventullo 3 days ago

        Yet here we are talking about the person… they sound pretty influential to me. (I’m only half-joking)

  • Twirrim 4 days ago

    > line management was unable or unwilling to invest in fixing root causes of operational issues.

    Sorry for an obligatory: there is no such thing as a root cause.

    That said, that matches my general experience too (I left about 9 years ago). Unless the S-team specifically calls them out for any particular metric, it's not going to get touched.

    Even then they'll try and game the metric. Sev2 rate is too high, let's find some alarms that are behind lots of false positives, and just make them sev3 instead, rather than investigate why. No way it can backfire... wait what do you mean I had an outage and didn't know, because the alarm used to fire legitimately too?

    That major S3 collapse several years ago was caused by a component that engineers had warned leadership about for at least 4-5 years when I was there. They'd carefully gathered data, written reports, written up remediation plans that weren't particularly painful. Engineers knew it was in an increasingly fragile state. It took the outage for leadership to recognise that maybe, just maybe, it was time to follow the plan laid out by engineering. I can't talk about the actual what/why of that component, but if I did it'd have you face palming, because it was painfully obvious before the incident that an incident was inevitable.

    Unfortunately, it seems like an unwillingness to invest in operations just pervades the tech industry. So many folks I speak to across a wide variety of tech companies are constantly having to fight to get operations considered any kind of a priority. No one gets promoted for performing miracles keeping stuff running.

    • akulbe 4 days ago

      I'm curious why you say "there is no such thing as a root cause". Is this because that's what you genuinely believe, or was this just Amazon culture?

      • donavanm 4 days ago

        In addition to other comments see https://en.m.wikipedia.org/wiki/Ishikawa_diagram or the “new school” of safety engineering thats trying to get away from ideas like “RCA.” Sidney Dekker publishes an older version of his seminal work https://www.humanfactors.lth.se/fileadmin/lusa/Sidney_Dekker...

        Im ex AWS and the internal tools _try_ to get away from the myth of human error and root cause. But its difficult when humans read and understand in a linear post hoc fashion.

      • lazyant 4 days ago

        https://en.wikipedia.org/wiki/Swiss_cheese_model (kind of), pretty standard nowadays way in SRE to talk about root causes (plural) because usually it's more than one specific thing.

        • Twirrim 3 days ago

          There's some serious academic critique of the Swiss Cheese model, too.

          That said, I rather like it. It's straightforward to explain to people, they very quickly "get" it, and it gets them thinking along the right lines.

      • [removed] 3 days ago
        [deleted]
      • Twirrim 3 days ago

        Amazon culture was still really rather root cause oriented when I left. The COE process they followed to do post-incident analysis was flawed and based on discredited approaches like the "5 whys". I don't know if it has changed since I left, I rather hope it has.

        I genuinely believe there is no such thing as a root cause. The reason I believe that is both grounded in personal experience, and in the 40+ years of academic research that demonstrates in far greater detail that no failure comes about as a result of a single thing. Failure is always complex, and that approaches to incident analysis and remediation that are grounded in it are largely ineffectual. You have to consider all contributing factors if you want to make actual progress. The tech industry tends to trail really far behind on this subject.

        A couple of quick up-front reading suggestion, if I may:

        * https://how.complexsystems.fail/ - An excellent, and short, read. It's not really written from the perspective of tech, but every single point is easily applicable to technological systems. I encourage engineers to think about the systems that they're responsible for at work, their code, incidents they've been involved in, as they read it. Everything we deal with is fundamentally complex. Even a basic application that runs on a single server is complex, because of all the layers of the operating system below it, let alone the wider environment of the network and beyond.

        * If you're willing to go deeper: "Behind Human Error" ISBN-10: ‎9780754678342. It's a very approachable book, written by Dr Woods and several other prominent names in the academic research side in to failures.

        My favourite example to use when talking about why there's no such thing as a root cause has been the 737-MAX situation. Which has only become more apt over time, as we've learned just how bad that plane was.

        With the original 737-MAX situation, we had two planes crash in the space of 5 months.

        If we follow a root cause analysis approach, you'll end up looking at the fact that the planes were reliant on a single sensor to tell them their angle of attack (AoA). So replace the single sensor with multiple, something that was already possible to do, and the planes stop crashing. Job done. Resilience achieved!

        That doesn't really address a lot of important things, like, how did we end up in a situation where the plane was ever sold with a single AoA sensor (especially one with a track record of inaccuracy)?

        Why did the Maneuvering Characteristics Augmentation System (MCAS) keep overriding pilot input on the controls, which it didn't do on previous implementations of MCAS systems? Why was the existence of the MCAS system not in the manuals? Why could MCAS repeatedly activate?

        If the MCAS hadn't behaved in such a bizarre fashion, where it kept overriding input, it's arguable that the pilots would have been able to correct its erroneous dropping of the nose. If you're sticking with a "root cause" model, there's a pretty strong argument that the MCAS system was the actual root cause of the incident.

        Given the fairly fundamental changes to the controls, and the introduction of the MCAS, why were pilots not required to be trained on the new model of plane? Is Boeings attempts to avoid pilots needing retraining the actual root cause?

        Why was an MCAS system necessary in the first place? It's because of the engine changes they'd made, and how they'd had to position them in relation to the wings that tended to result in the planes nose wanting to go up and induce a stall. The MCAS system was introduced to help with counteracting that new behaviour. Was the new engine choice the root cause of failure?

        and so on and so forth..

        If you look at the NTSB accident report for the 737-MAX flights, it goes way deeper, and broader in to the incident. It takes a holistic view of the situation, because even the answers to those questions I pose above are multi-faceted. As a result it had a long list of recommendations for Boeing, the FAA, and all involved parties.

        Those 2 crashes were the emergent property of a complex system. It's a symptom. A side effect. It required failures in the way things were communicated within Boeing, failures in FAA certification, failures in engineering, failures in management decision making, the works.

        No one at Boeing deliberately set out to make a plane that crashes. A lot of the decisions about how they do things are reasonable, and make some sense, in isolation. But they all contributed together to make for a system to which it became a question of "when" and not "if" a major disaster would occur.

        If resilience and incident analysis just focuses on a singular root cause, the systems that produced the bad outcome, will continue, and will inevitably make more bad decisions. The only thing that would improve is that they would probably never make a decision to sell a plane with a single AoA sensor.

        As we've seen over time, the whole broken system has resulted in a lot of issues with the MAX, beyond that sensor situation.

    • 33MHz-i486 3 days ago

      the really dumb thing about working at AWS is they pay so much lip service to Ops, literally you can spend a third of a week in meetings talking about Ops Issues, but not a single long term project to improve the deeper architectural problems that cause bad Ops ever get funded.

    • jp57 4 days ago

      > Sorry for an obligatory: there is no such thing as a root cause.

      While I get what you mean, I think most people who've been in the situation know what I'm talking about. The same alarms are going off constantly and you keep doing the expedient thing to make them stop going off without investing any effort into stopping them from going off again in the same situation in the future.

      Of course there is a chain of causes, and maybe you need to refactor a module, or maybe you need to redesign an interface, or maybe you need to throw the whole thing away and start over -- we did all those things in different situations while I was there -- but there's a point at which looking at deeper causes loses value because those causes are not in our power to fix and we're left to defend against those failures: a system we rely on is unreliable; machines and networks go down unexpectedly; a lot of people have poor reading comprehension so even good docs are sometimes useless; we are all sinners whose pride and sloth sometimes leads us to make crappy software and systems; etc.

wubrr 4 days ago

> They then hand over the crappy MVP to other engineers to support and move on to other new products. On-call is absolutely brutal because of exactly that.

So fucking true. They also treat their employees like shit generally, and prefer to hire externally for higher level positions - causing existing engineers who are closely familiar with the systems to quit and replacing them with higher-paid new hires, who have no context or familiarity with the service/product in question. I worked there for a few years on some fairly important, foundational services, and it was incredible that they had almost no-one around who initially built these services... 50% of the job was oncall, 40% was reading and trying to understand huge amounts of undocumented code that no one was familiar with... I felt like I was back to working on legacy banking systems.

zzzbra 4 days ago

This sounds exactly like the team/culture that launched Marcus at Goldman Sachs. A lot of people went to Amazon from that team and seemed to indicate it was very much the same type of deal.

karmasimida 4 days ago

Right, this is accurate.

You can't have a mentality of working on something forever in AWS, unless it is S3/RDS/EC2, those forever systems. People are fighting to create new codenames for new products, PRFAQ all the time, etc.

Does this approach work? Maybe, but definitely at a cost. It creates many half-assed products that one acknowledgement away from turning off its life support. And many grifters and land grabbing attempts to create some glue services just to back on the hot new trends. Yes, I am talking about the AI stuff. It is embarrassing how little Amazon has to show for, while spending billions, all because the in fighting and internal sabotaging kills its chance before it can see the light of the day. Epic level failure if you ask me.

MuffinFlavored 4 days ago

I've never worked at a company where total compensation for engineers was more than $250k

To see "it's not worth it to make $420k as an L6 Amazon engineer" is super interesting

https://www.levels.fyi/companies/amazon/salaries/software-en...

  • NBJack 4 days ago

    It's the toil. The soul-crushing expectations. The "I'm surrounded by people and yet I've never felt so alone" kinda experience, where your co-worker may be nice, but there's not enough level appropriate work to go around.

    Then you learn how long it takes for that "420k" comp to manifest (typically about 2-3 years from hire if all goes well, longer if the market is down). At least your annual increase in time off is looking good by then!

    Well, assuming you make it that far. Whoops, did you forget to document how awesome you are and insure your manager sees it too? Or just make a 'blameless' mistake during an oncall rotation that made everything in the UK available at a steep discount? Sorry, ______, guess it's PIP time. We hope you succeed! Just don't look too hard at the success rate.

    And then, your average successful tenure of 3-5 years is up, and you get to look back at the intense stress, distrust of your boss/coworkers, impact to your relationships, and the toil on your family. Suddenly, the offers pouring in are looking better and better, even if the comp isn't as great.

    FWIW, the first 3-6 months tend to be great though!

  • ipaddr 4 days ago

    The base is: $284.1K. If you can make it 4 years where the average employment length is a year you can make that $420k. But it will require 16 hour days, luck and some high degree political skills.

    It's like big brother where someone on your team will be pipped each quarter and you need to make sure it's not you. When a teammate asks for help find creative ways to make them look bad.

    • ctvo 3 days ago

      > The base is: $284.1K. If you can make it 4 years where the average employment length is a year you can make that $420k. But it will require 16 hour days, luck and some high degree political skills.

      This is untrue for Amazon, at least in the US. Your total compensation is guaranteed for the first 4 years. If your total compensation in your offer was 500k, you'll get ~500k per year.

      The first two years you're paid in cash so it'll match the offer exactly, the final two years you're paid in RSUs (stock based) with the stock price calculated at the time of the offer. Your salary may vary due to stock price. Historically it's gone up, not down, and this is how people have made 600k+, for example, as a senior engineer there.

      Starting from year 2-3, you'll receive new stock grants that will vest in the years after 4 if you stay that long.

      Source: I've received offer letters.

      • pfannkuchen 3 days ago

        Presumably your comp changes if you get promoted in that time though?

    • runamuck 4 days ago

      "When a teammate asks for help find creative ways to make them look bad." Thanks for this gem, this captures the culture in a nutshell.

      • scarface_74 3 days ago

        When I was there, they weren’t creative. They outright said that doing certain things didn’t help them get promoted.

    • dheera 4 days ago

      > When a teammate asks for help find creative ways to make them look bad.

      Yep, lots of idea-snatching, not crediting each other, teaming up to not including a particular team member to try to ensure that team member is the one that gets PIP

    • neofrommatrix 3 days ago

      Regarding your last point, I’ve witnessed two different engineers cry on two different calls because they weren’t getting any support from senior engineers and were at risk of being pipped. It’s that ridiculous. And these were not incompetent engineers. One went to Meta and another went to Cloudflare after Amazon.

  • sakopov 4 days ago

    It's not interesting, it's preposterous to think that you can make that money and just kick your feet up every day and twiddle your thumbs. Yeah, there is going to be fucking stress. That's why the pay is so high. My non-FAANG job is 100% constant stress day-in and day-out and I don't make even half of these comps.

    • pb7 3 days ago

      Well then you also have a crappy job. Good money doesn't need to equal high stress. You can do a good job, create positive value, and live a low stress life. Step 1: Don't work at Amazon.

    • underlipton 4 days ago

      People burn out and have nervous breakdowns dealing with the stress of working retail (violent customers, sick customers, terrible hours, short breaks, coffin-like break rooms, benefits like "50 cents off a bag of pistachios") for $15 an hour. I wager most would go back in a heartbeat for $400k/yr.

      Perspective.

      • sirsinsalot 3 days ago

        Relative hardship comparisons generally suck.

        "hah you think retail is hard, imagine working for a drug gang in Africa! You'd go back to retail in a snap!"

        Stress is stress. It makes you unwell. The remuneration isn't the point.

      • hmmm-i-wonder 3 days ago

        Shit to Salary ratio.

        Its almost universal that we will deal with more crap for more money. Some people wont, but most will.

        There is a line for everyone though where the shit is too much for any salary.

  • hughesjj 4 days ago

    From the same website, other FAANG offers more. For quite some time while I was there, my peers with the same industry experience were earning 50-100% more than myself at Google and Meta.

    Also keep in mind Amazon is headquartered in Seattle, which is far from a cheap area, and of the 5 submitters to levels.fyi for sde3 Seattle new hires in the last 6 months, the range is 250k-425k.

    Take into account that an L6 who started from L4 normally has the scope and competence of a Staff engineer at Google, it makes sense to me.

    If all you wanted was money, you could do even better by going into finance or OpenAI and work your life away until you can't anymore. It's just not sustainable for most people long term, no matter what the pay is, which itself is less than many contemporaries in the same "class".

  • [removed] 4 days ago
    [deleted]
  • __turbobrew__ 3 days ago

    After you get woken up by pages enough time you really start to question the monetary value of sleep. You will also miss life events such as birthdays, helping friends move, your child’s sporting events, etc.

    Being on-call at these companies is equivalent to making work your first priority in life every few weeks. That is a big sacrifice.

HeyLaughingBoy 4 days ago

NGL, that sounds like my ideal work environment. Except for the in-office part.

leetcrew 4 days ago

eh, big company, many different opinions. working on stuff that's already built can be pretty chill. you spend a lot of time being hard blocked on approvals from external teams. no amount of extra hours can change that, and management generally understands. the downside is that promos are harder to find and there's a greater risk of some VP figuring out that your org doesn't really do anything useful. then it's time for the next round of musical chairs.

if you're more ambitious and/or genuinely enjoy building things, new product teams are the place to be. you don't have to deal with approval hell so much, but the dates are more aggressive and managers will do anything to hit them. this is where you learn what "building the plane while flying it" means.

I find people exaggerate how bad it is, but you definitely need to be good at reading the room to stick around.

  • neofrommatrix 3 days ago

    It’s no exaggeration when you witness two new engineers cry on two different calls because they are not getting any support from the more senior engineers and are at risk of being pipped.

amw-zero 4 days ago

Levels.fyi puts L6 at Amazon at over $400k. That’s not worth it?

  • fshbbdssbbgdd 3 days ago

    From what I could tell, the equivalent level (in terms of scope/responsibilities) at the other FANG companies pays more. So even if you are just after the money, it doesn’t seem worth it if the others are also willing to hire you at the comparable level. Of course, there are exceptions - like there’s some managers I would follow anywhere, or some projects are just that exciting.

  • notinmykernel 4 days ago

    No

    • amw-zero 4 days ago

      And why is that?

      • TheKarateKid 3 days ago

        For software engineering, I have heard stories from all around about the PIP culture due to an 8-12% mandatory URA quota. This results in gaslighting and forced firings to engineers that have a history of performing well.

        I personally know a SDM looking to leave now because he said he was basically told to fire one of his reports who was performing fine, or he would be the one fired. I've read stories of "hire to fire" where they hire ICs with the intention of letting them go in 6-12 months so another member of their team doesn't have to.

        Culture is self-serving, hunger games style. Management is cut-throat and not empathetic at all.

        The NYTimes did a big expose about the toxic culture of working at Amazon, and Bezos' public response was literally along the lines of "we move fast and know that our competitive culture is not for everyone. Our employees are up for the challenge and we are proud of what we accomplish."

        In fact, during the hiring boom in 2020-2022 their reputation of SWE ICs was so bad that they were having trouble hiring. Many (myself included) would receive 1-2 emails from different recruiters desperately trying to get us to apply. We all knew to stay away.

        The sad part is that many of the toxic managers and even ICs have infiltrated much more of FAANG (and big tech) and with the economic recession especially in tech, Amazon's cut-throat management styles like "performance culture" and PIP quotas have been happening in many other places now.

      • notinmykernel 4 days ago

        For many, many people, (majority of programmers, really) morals don't have a price tag. That number could be anything. When you make a programmer choose between some number and customer damage, it becomes increasingly harder to damage the end-customer. This is known as "prioritizing customer obsession" over business function (which is what we're supposed to do).

        Please, everyone, if there has to be a choice, save us (civilization) not them (the CEOs). Please. Think critically.