When imperfect systems are good: Bluesky's lossy timelines

785 points by cyndunlop 5 months ago

pornel 5 months ago

I wonder why timelines aren't implemented as a hybrid gather-scatter choosing strategy depending on account popularity (a combination of fan-out to followers and a lazy fetch of popular followed accounts when follower's timeline is served).

When you have a celebrity account, instead of fanning out every message to millions of followers' timelines, it would be cheaper to do nothing when the celebrity posts, and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline. When millions of followers do that, it will be cheap read-only fetch from a hot cache.

Reply View 41 replies

ericvolp12 5 months ago

This is probably what we'll end up with in the long-run. Things have been fast enough without it (aside from this issue) but there's a lot of low-hanging fruit for Timelines architecture updates. We're spread pretty thin from a engineering-hours standpoint atm so there's a lot of intense prioritization going on.

Reply View | 23 replies
- Xunjin 5 months ago
  
  Just to be clear, you are a Bluesky engineer, right?
  off-topic: how has been dealing with the influx of new users after X political/legals problems aftermath? Did you see an increase in toxicity around the network? And how has you (Bluesky moderation) dealing with it.
  
  Reply View | 20 replies
  
  ToucanLoucan 5 months ago
  
  [flagged]
  
  Reply View | 19 replies
- petra 5 months ago
  
  Maybe this would be helpful:http://daslab.seas.harvard.edu/datacalculator/
  
  Reply View | 0 replies
- curious_cat_163 5 months ago
  
  That's insightful. Keep up the good work!
  
  Reply View | 0 replies
VWWHFSfQ 5 months ago

At some point they'll end up just doing the Bieber rack [1]. It's when a shard becomes so hot that it just has to be its own thing entirely.
[1] - https://www.themarysue.com/twitter-justin-bieber-servers/
@bluesky devs, don't feel ashamed for doing this. It's exactly how to scale these kinds of extreme cases.

Reply View | 5 replies
- genewitch 5 months ago
  
  I've stood up machines for this before I did not know they had a name, and I worked at the mouse company and my parking spot was two over from a J. Beibe'rs spot.
  So now we have Slashdot effect, HN hug, and its not Clarkson its... Stephen Fry effect? Maybe can be Cross-Discipline - there's a term for when lots of UK turns their kettles on at the same time.
  I should make a blog post to record all the ones I can remember.
  
  Reply View | 1 reply
  
  k1t 5 months ago
  
  TV Pickup aka the Half Time Kettle Effect.
  https://en.wikipedia.org/wiki/TV_pickup
  
  Reply View | 0 replies
- bitbckt 5 months ago
  
  We never actually had a literal “Bieber Box”, but the joke took off.
  Hot shards were definitely an issue, though.
  
  Reply View | 0 replies
- stavros 5 months ago
  
  Given that BlueSky is funded by Twitter, I'm assuming they know a lot more than us on how Twitter architects systems.
  
  Reply View | 0 replies
- Imustaskforhelp 5 months ago
  
  Its so crazy.
  Thanks a lot for sharing this link.
  
  Reply View | 0 replies
rubslopes 5 months ago

This problem is discussed in the beginning of the Designing Data-Intensive Applications book. It's worth a read!

Reply View | 2 replies
- Brystephor 5 months ago
  
  Do you know the name of the problem or strategy used for solving the problem? I'd be interested in looking it up!
  I own DDIA but after a few chapters of how database work behind the scenes, I begin to fall asleep. I have trouble understanding how to apply the knowledge to my work but this seems like a useful thing with a more clear application.
  
  Reply View | 1 reply
  
  bitbckt 5 months ago
  
  Yes, we used the Yahoo! “Feeding Frenzy” paper as the basis for the design of Haplocheirus (the timeline service).
  
  Reply View | 0 replies
rsynnott 5 months ago

> and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline
I think then you still have the 'weird user who follows hundreds of thousands of people' problem, just at read time instead of write time. It's unclear that this is _better_, though, yeah, caching might help. But if you follow every celeb on Bluesky (and I guarantee you this user exists) you'd be looking at fetching and merging _thousands_ of timelines (again, I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users).
Given the nature of the service, making read predictably cheap and writes potentially expensive (which seems to be the way they've gone) seems like a defensible practice.

Reply View | 2 replies
- fc417fc802 5 months ago
  
  > I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users
  Random sampling? It's not as though the user needs thousands of posts returned for a single fetch. Scrolling down and seeing some stuff that's not in chronological order seems like an acceptable tradeoff.
  
  Reply View | 0 replies
- christkv 5 months ago
  
  You might mix the approaches based on some cut off point
  
  Reply View | 0 replies
locusofself 5 months ago

Why do they "insert" even non-celebrity posts into each follower's timeline? That is not intuitive to me.

Reply View | 4 replies
- giovannibonetti 5 months ago
  
  To serve a user timeline in single-digit milliseconds, it is not practical for a data store to load each item in a different place. Even with an index, the index itself can be contiguous in disk, but the payload is scattered all over the place if you keep it in a single large table.
  Instead, you can drastically speed up performance if you are able to store data for each timeline somewhat contiguously on disk.
  
  Reply View | 0 replies
- wlonkly 5 months ago
  
  Think of it as pre-rendering. Of pre-rendering and JIT collecting, pre-rendering means more work but it's async, and it means the timeline is ready whenever a user requests it, to give a fast user experience.
  (Although I don't understand the "non-celebrity" part of your comment -- the timeline contains (pointers to) posts from whoever someone follows, and doesn't care who those people are.)
  
  Reply View | 2 replies
  
  locusofself 5 months ago
  
  Perhaps I misunderstanding, I thought the actual content of each tweet was being duplicated to every single timeline who followed the author, which sounded extremely wasteful, especially in the case of someone who has 200 million followers.
  
  Reply View | 1 reply
  
  TimK65 5 months ago
  
  From the linked article: "Additionally, a reference to your post is 'fanned out' to your followers so they can see it in their Timelines."
  So not the content, just a sort of link to it.
  
  Reply View | 0 replies

ChuckMcM 5 months ago

As a systems enthusiast I enjoy articles like this. It is really easy to get into the mindset of "this must be perfect".

In the Blekko search engine back end we built an index that was 'eventually consistent' which allowed updates to the index to be propagated to the user facing index more quickly, at the expense that two users doing the exact same query would get slightly different results. If they kept doing those same queries they would eventually get the exact same results.

Systems like this bring in a lot of control systems theory because they have the potential to oscillate if there is positive feedback (and in search engines that positive feedback comes from the ranker which is looking at which link you clicked and giving it a higher weight) and it is important that they not go crazy. Some of the most interesting, and most subtle, algorithm work was done keeping that system "critically damped" so that it would converge quickly.

Reading this description of how user's timelines are sharded and the same sorts of feedback loops (in this case 'likes' or 'reposts') sounds like a pretty interesting problem space to explore.

Reply View 23 replies

snailmailman 5 months ago

I guess I hadn’t considered that search engines could be reranking pages on the fly as I click them. I’ve been seeing my DuckDuckGo results shuffle around for a while now thinking it’s an awful bug.
Like I click one page, don’t find what I want, and go back thinking “no, I want that other result that was below” and it’s an entirely different page with shuffled results, missing the one that I think might have been good.

Reply View | 6 replies
- PaulHoule 5 months ago
  
  That's connected with a basic usability complaint about current web interfaces, that ads and recommended content aren't stable. You very well might want to engage with an ad after you are done engaging what you wanted to engage with but you might never see it again. Similarly, you might see two or three videos that you want to click on on the side of a YouTube video you're watching but you can only click on one (though if you are thinking ahead you can open these in another tab.)
  On top of that immediate frustration, the YouTube style interface here
  https://marvelpresentssalo.com/wp-content/uploads/2015/09/id...
  collects terrible data for recommendations because, even though it gives them information that you liked the thumbnail for a video, they can't come to any conclusion about whether or not you liked any of the other videos. TikTok, by focusing on one video at a time, collects much better information.
  
  Reply View | 1 reply
  
  4ggr0 5 months ago
  
  > though if you are thinking ahead you can open these in another tab
  or add it to the "Watch Later" playlist :) so you can watch it...later.
  
  Reply View | 0 replies
- cgriswald 5 months ago
  
  I don't use DDG, but in my (very limited, just now) testing it doesn't seem to shuffle results unless you reload the page in some way. Is it possible you're browser is reloading the page when you go back? If so, setting DDG to open links in new tabs might fix this problem.
  
  Reply View | 1 reply
  
  snailmailman 5 months ago
  
  Interesting. Maybe something in my configuration is affecting it. I’ll have to look into it
  
  Reply View | 0 replies
- numeri 5 months ago
  
  This behavior started happening for me in the last few months. If I click on a result, then go back, I have different search results.
  I've found a workaround, though – click back into the DDG search box at the top of the page and hit enter. This then returns the original search results.
  
  Reply View | 0 replies
- gtfiorentino 5 months ago
  
  Hi - I work on search at DuckDuckGo. Do you mind sharing a bit more detail about this issue? What steps would allow us to reproduce what you're seeing?
  
  Reply View | 0 replies
gopher_space 5 months ago

> Some of the most interesting, and most subtle, algorithm work was done keeping that system "critically damped" so that it would converge quickly.
Looking back at my early work with microservices I'm wondering how much time I would have saved by just manually setting a tongue weight.

Reply View | 0 replies
dwedge 5 months ago

Similar to how Google images loads lower quality blurred thumbnails towards the bottom of the window at first so that the user thinks they loaded faster

Reply View | 0 replies
aqueueaqueue 5 months ago

This is less a question of perfection and one of trade off's. Laws of physics put a limit on how efficiently you can keep data in NYC and London in perfect sync, so you choose CAP-style trade-offs. There are also $/SLO trade-offs. Each 9 costs more money.
I like your example it is very interesting. If I get to work on (or even hear someone in my team is working on) such interesting problems and I can hear about it, I get happy.
Interesting problems are rare because like a house you might talk about brick vs. Timber frame once, but you'll talk about cleaning the house every week!

Reply View | 0 replies
gregw134 5 months ago

Would you be willing to share more about how you guys did click ranking at Blekko? It's an interesting problem.

Reply View | 0 replies
culi 5 months ago

What became of Blekko?

Reply View | 10 replies
- an_ko 5 months ago
  
  > It was acquired by IBM in March 2015, and the service was discontinued.
  — https://en.wikipedia.org/wiki/Blekko
  Perhaps GP has a more interesting answer though.
  
  Reply View | 9 replies
  
  ChuckMcM 5 months ago
  
  That's the correct answer, IBM wanted the crawler mostly to feed Watson. Building a full search engine (crawler, indexer, ranker, API, web application) for the English language was a hell of an accomplishment but by the time Blekko was acquired Google was paying out tens of billions of dollars to people to send them and only them their search queries. For a service that nominally has to live on advertising revenue getting humans to use it was the only way to be net profitable, and you can't spend billions buying traffic and hope to make it back on advertising as the #3 search engine in the English speaking markets.
  There are other ways to monetize search (look at Kagi for example) than advertising. Blekko missed that window though. (too early, Google needed to get a crappy as it is today to make the value of a spam free search engine desirable)
  
  Reply View | 8 replies
genewitch 5 months ago

PID techniques useful?

Reply View | 0 replies

rakoo 5 months ago

Ok I'm curious: since this strategy sacrifices consistency, has anyone thoughts about something that is not full fan-out on reads or on writes ?

Let's imagine something like this: instead of writing to every user's timeline, it is written once for each shard containing at least one follower. This caps the fan-out at write time to hundreds of shards. At read time, getting the content for a given users reads that hot slice and filters actual followers. It definitely has more load but

- the read is still colocated inside the shard, so latency remains low

- for mega-followers the page will not see older entries anyway

There are of course other considerations, but I'm curious about what the load for something like that would look like (and I don't have the data nor infrastructure to test it)