Comment by jp57

Comment by jp57 5 days ago

19 replies

> Glacier restores are also no longer painfully slow.

I had a theory (based on no evidence I'm aware of except knowing how Amazon operates) that the original Glacier service operated out of an Amazon fulfillment center somewhere. When you put it a request for your data, a picker would go to a shelf, pick up some removable media, take it back, and slot it into a drive in a rack.

This, BTW, is how tape backups on timesharing machines used to work once upon a time. You'd put in a request for a tape and the operator in the machine room would have to go get it from a shelf and mount it on the tape drive.

danudey 5 days ago

The most likely explanation is that they used a tape robot, such as the one seen here:

https://www.reddit.com/r/DataHoarder/comments/12um0ga/the_ro...

Which is basically exactly what you described but the picker is a robot.

Data requests go into a queue; when your request comes up, the robot looks up the data you requested, finds the tape and the offset, fetches the tape and inserts it into the drive, fast-forwards it to the offset, reads the file to temporary storage, rewinds the tape, ejects it, and puts it back. The latency of offline storage is in fetching/replacing the casette and in forwarding/rewinding the tape, plus waiting for an available drive.

Realistically, the systems probably fetch the next request from the queue, look up the tape it's on, and then process every request from that tape so they're not swapping the same tape in and out twenty times for twenty requests.

  • philistine 5 days ago

    I've read very definitive discussions on here that Glacier never used tape. It has always been powered off hard disks.

    • UltraSane 5 days ago

      For truly write once read never data tape is the optimal storage method. It is exactly what the LTO standard was designed to do and it does it very well. You can be confident that you will be able to read every bit of data from a 30 year old tape, probably even 50 years old. It has the lowest bit error rate of any technology I am aware of. LTO-9 is better than 1 uncorrectable bit error in 10^20 user bits, which is 1 bit error in 12.5 exabytes. There is also the substantial advantage that tapes on a shelf are completely immune to ransomware. As a sysadmin I get that warm fuzzy feeling when critical data is backed up on a good LTO tape library.

      • dabiged 5 days ago

        As someone who does tape recovery on very very old tape I largely concur with this with a couple of caveats.

        1. Do not encrypt your tapes if you want the data back in 30/50 years. We have had so many companies lose encryption keys and turn their tapes into paperweights because the company they bought out 17 years ago had poor key management.

        2. The typical failure case on tape is physical damage not bit errors. This can be via blunt force trauma (i.e. dropping, or sometimes crushing) or via poor storage (i.e. mould/mildew).

        3. Not all tape formats are created equal. I have seen far higher failure rates on tape formats that are repeatedly accessed, updated, ejected, than your old style write once, read none pattern.

      • count 5 days ago

        Call it bad luck, but I’ve never had a fully successful restore. Drives eat tapes, drives are damaged and write bad data, robot arms die or malfunction. Tapes have NEVER worked for me. SANs and remote disk though, rock solid.

        That said, I don’t miss any of that stuff, gimme S3 any day :)

        • UltraSane 4 days ago

          You do realized that that isn't normal at all? LTO tape is still used by thousands of companies to backup many exabytes of data. I know it once saved Google from permanent loss of gmail data from a bug. You should really get a refund for your tape drives.

      • meepmorp 4 days ago

        Aren't LTO formats only backward compatible with the immediate prior version?

    • danudey 3 days ago

      That's... interesting. I wonder what the wear-and-tear on an HDD is to spin it up/power it back down again.

Twirrim 5 days ago

I can't talk about it, but I've yet to see an accurate guess at how Glacier was originally designed. I think I'm in safe territory to say Glacier operated out of the same data centers as every other AWS service.

It's been a long time, and features launched since I left make clear some changes have happened, but I'll still tread a little carefully (though no one probably cares there anymore):

One of the most crucial things to do in all walks of engineering and product management is to learn how to manage the customer expectations. If you say customers can only upload 10 images, and then allow them to upload 12, they will come to expect that you will always let them upload 12. Sometimes it's really valuable to manage expectations so that you give yourself space for future changes that you may want to make. It's a lot easier to go from supporting 10 images to 20, than the reverse.

  • donavanm 4 days ago

    Im like 90% sure ive seen folks (unofficially) disclose the original storage and API decisions over the years, in roughly accurate terms. Personally I think the multi dimensional striping/erasure code ideas are way more interesting than the “its just a tape library” speculation/arguments. That and the real lessons learned around product differentiation as supporting technologies converge.

  • kelnos 4 days ago

    > I can't talk about it, but I've yet to see an accurate guess at how Glacier was originally designed.

    It feels odd that this is some sort of secret. Why can't you talk about it?

    • Twirrim 4 days ago

      I signed NDAs. I wish Glacier was more open about their history, because it's honestly interesting, and they have a number of notable innovations in how they approach things.

      • Dylan16807 4 days ago

        Well assuming your NDA is a reasonable length I hope you talk about it later.

        (And if Amazon is making unreasonable length NDAs I hope they lose a lot of money over it.)

  • mh- 5 days ago

    ..oh. That's clever. Thanks for posting this.

jp57 4 days ago

I think folks have missed what I think would have been clever about the implentation I (apparently) dreamt up. It's not that "it's just a tape library", it's that it would have used the existing FC and picker infrastructure that Amazon had already built, with some racks containing drives for removable media. I was thinking that it would not have been some special facility purely for Glacier, but rather one or more regular FCs would just have had some shelves with Glacier media (not necessarily tapes).

Then the existing pickers would get special instructions on their handhelds: Go get item number NNNN from Row/shelf/bin X/Y/Z and take it to [machine-M] and slot it in, etc.

christina97 5 days ago

They would definitely be using rubies robots given how uniform hard drives are. The only reason warehouses still have humans is that heterogeneity (different sizes, different textures, different squishiness, etc).