Comment by shaggie76
Comment by shaggie76 6 days ago
We had a similar CDN problem with releasing major Warframe updates: our CDN partner would inadvertently DDoS our origin servers when we launched an update because thousands of cold edges would call home simultaneously when all players players relogged at the same time.
One CDN vendor didn't even offer a tiered distribution system so every edge called home at the same time, another vendor did have a tiered distribution system designed to avoid this problem but it was overwhelmed by the absurd number of files we'd serve multiplied by the large user count and so we'd still end up with too much traffic hitting the origin.
The interesting thing was that no vendor we evaluated offered a robust preheating solution if they offered one at all. One vendor even went so far as to say that they wouldn't allow it because it would let customers unfairly dominate the shared storage cache at the edge (which sort of felt like airlines overbooking seats on a flight to me).
These days we run an army of VMs that fetch all assets from every point of presence we can cover right before launching an update.
Another thing we've had to deal with mentioned in the article is overloading back-end nodes; our solution is somewhat ham-fisted but works quite well for us: we cap the connection counts to the back end and return 503s when we saturate. The trick, however, is getting your load-balancer to leave the client connection open when this happens -- by default multiple LBs we've used would slam the connection closed so that when you're serving up 50K 503s a second the firewall would buckle under the runaway connection pool lingering in TIME_WAIT. Good times.
Something I've been wondering for a while is if BitTorrent or other P2P protocols are ever a consideration for pushing game updates? Naively, it seems like an ideal fit since a large swarm of leechers quickly turns into a large swarm of (partial) seeders mostly chattering amongst themselves. I recall Facebook and Twitter used to internally torrent their updates in the 2010s and BT scales just fine to thousands of peers and tens of GB files at least, but I think I've only ever played one game whose updater was a torrent client so I'm guessing it's a nonstarter for one reason or another. Are game publishers just allergic to it due to the piracy association? Are end-user upload speeds too slow to meaningfully make a difference? Are swarms of ~100k just too large to manage?
Edit: Silly me for posting while sleep deprived. It's not the update itself that you're saying is causing thundering herd issues, but the log-ins all being synced up afterwards much like in TFA, duh. My curiosity wrt the apparent lack of P2P game updaters still stands though.