Comment by ZeroCool2u

Comment by ZeroCool2u 16 hours ago

22 replies

I've had to repeatedly tell our AWS account reps that we're not even a little interested in the Trainium or Inferentia instances unless they have a provably reliable track record of working with the standard libraries we have to use like Transformers and PyTorch.

I know they claim they work, but that's only on their happy path with their very specific AMI's and the nightmare that is the neuron SDK. You try to do any real work with them and use your own dependencies and things tend to fall apart immediately.

It was just in the past couple years that it really became worthwhile to use TPU's if you're on GCP and that's only with the huge investment on Google's part into software support. I'm not going to sink hours and hours into beta testing AWS's software just to use their chips.

ecshafer 16 hours ago

IMO AWS once you get off the core services is full of beta services. S3, Dynamo, Lambda, ECS, etc are all solid. But there are a lot of services they have that have some big rough patches.

  • jeffparsons 13 hours ago

    RDS, Route53, and Elasticache are decent, too. But yes, I've also been bitten badly in the distant past by attempting to rely on their higher-level services. I guess some things don't change.

    I wonder if the difference is stuff they dogfood versus stuff they don't?

    • phantasmish 10 hours ago

      I once used one of their services (I forget which, but I think it was there serverless product) that “supported” Java.

      … but the official command line tools had show-stopper bugs if you were deploying Java to this service, that’d been known for months, and some features couldn’t be used in Java, and the docs were only like 20% complete.

      But this work-in-progress alpha (not even beta quality because it couldn’t plausibly be considered feature complete) counted as “supported” alongside other languages that were actually supported.

      (This was a few years ago and this particular thing might be a lot better now, but it shows how little you can trust their marketing pages and GUI AWS dashboards)

      • nunez 7 hours ago

        I'm assuming you're talking about Lambda. I don't mess with their default images. Write a Dockerfile and use containerized Lambdas. Saves so many headaches. Still have to deal with RIE though, which is annoying.

    • ozten 13 hours ago

      A big problem for a when three AWS teams launch the same thing. Lowers confidence in dogfooding the “right” one.

      • smallmancontrov 9 hours ago

        Or when your AWS account rep is schmoozing your boss trying to persuade them to use something that is officially deprecated, lol.

    • nunez 7 hours ago

      My understanding is that AWS productizes lots of one-offs for customers (like Snowball), so that makes sense

    • raw_anon_1111 11 hours ago

      Amazon Connect is a solid higher level offering. But only because it is a productized version of Amazon Retail’s call center

  • kentm 15 hours ago

    I'd add SQS to the solid category.

    But yes, the less of a core building block the specific service is (or widely used internally in Amazon), the more likely you are to run into significant issues.

  • weird-eye-issue 8 hours ago

    True with Cloudflare too. Just stick with Workers, R2, Durable Objects, etc...

    • plantain 7 hours ago

      Not even sure about R2 with it's unpredictable latencies.

      • weird-eye-issue 6 hours ago

        Hmm is it actually that bad? Keep in mind r2 is only stored in one region which is chosen when the bucket is first created so that might be what you're seeing

        But I've never really looked too closely because I just use it for non-latency critical blob storage

  • [removed] 12 hours ago
    [deleted]
  • belter 13 hours ago

    >But there are a lot of services they have that have some big rough patches.

    Enlight us...

mountainriver 12 hours ago

Agree, Google put a ton of work into making TPUs usable with the ecosystem. Given Amazon’s track record I can’t imagine they would ever do that.

  • klysm 12 hours ago

    There might be enough market pressure right now to make them think about it, but the stock price went up enough from just announcing it so whatever

htrp 13 hours ago

spoiler alert, they don't work without a lot of custom code