Comment by danabramov
Comment by danabramov a day ago
Hmm, no, that’s not what I meant. Let me try to break it down a bit.
There’s really two main kinds of nodes in the system. Hosting servers and app servers. They’re completely unrelated and completely decoupled. It’s like Dropbox vs apps that put data in your Dropbox.
A hosting server stores your personal data. This is similar to having a Git repository with data from all social apps. Or like a Dropbox folder. That’s usually called a “PDS” — a personal data server. Running one is extremely cheap since it’s only your data. It is also optional (eg Bluesky provides AT hosting for free). But this is not an app — it’s literally like Git hosting. Just the data (for all apps).
Then you have app backends. Those are just normal servers. They’re what you’d typically think of web applications. The Bluesky app is one of them. An application server listens to events from all known hosting servers and updates its local database with whatever it’s interested in from the stream. For example, the Bluesky application server updates its local database to put all “post created”, “like created” etc events from all hosting servers into its database that it can query.
So as an app author you have a lot of freedom for what to build:
- You can build a new app that only listens to record of your app’s type. So naturally it would only index your app’s users’ content. Which is presumably not much.
- You can take an app server for existing app (if it’s open source) and run it yourself. But then of course if this app has a million of users, you need to decide which records you want. Do you want to index them all (like the original app)? Do you want to index a subset? Which subset? It could be historical (eg two last weeks of post, one last week of likes etc). Or it could be by proximity (only profiles, posts and likes within one follow from you). Or something else. You decide what to store.
- You can also build something hybrid — an app that remixes data from multiple apps. And you can fetch data from hosting servers without storing it (but this doesn’t give you aggregation) or fetch aggregated data from community indexes (if the aggregation you need already exists and is provided by someone else).
Hope this makes sense.
(As a performance optimization, instead of aggregating from millions of repositories individually, you’d listen to a stream that combines them. That’s called “relay”. Relays are mostly dumb websocket retransmitters and don’t have any app-specific logic. Bluesky runs one, Blacksky runs their own, and it would generally cost $30/mo to run one today. Any hosting server can ask any relay to crawl it. Any relay may also choose to crawl a new hosting server if it encounters links to content on that server. Relays are common infra and you shouldn’t expect there to be a lot of them. App servers choose which relay to listen to, if at all.)
---
Now answering your specific questions:
>I don't understand why you become isolated once you've built your own app
If you've built an app that looks like Bluesky, but only you and your friends' posts/likes show up, is that much better than just using Bluesky? My point is that usually this isn't a differentiator and feels kind of pointless. You might as well just curate your Following feed on Bluesky. So people don't do that often.
>it it because the bluesky firehouse has to decide to index posts I make on my server?
This seems like a misconception; moving your data (to your own hosting) is a completely separate thing from creating an app. See the distinction above. You can move your hosting to a different hosting server, but this wouldn't affect your experience in the Bluesky app at all. The Bluesky application server would simply start ingesting your posts from your new server instead once it gets notified about your account move.
>I guess I'm asking how does an application decide which sources to index from, just anyone advertising that they are serving that lexicon?
Typically an application just listens to a relay (like the one hosted on Bluesky) which already retransmits events from all known repositories. If you operate your own repository, you can send a "request crawl" command to Bluesky's relay, and it will index you. This is kind of similar to a website getting picked up by Google search. Links may also do it but a "request crawl" is the explicit way. See https://pdsls.dev/jetstream?instance=wss%3A%2F%2Fjetstream1.... for a live feed of the relay operated by Bluesky (it's not specific to the Bluesky app).
>Why then would I become isolated by virtue of hosting only data I want to host/indexing only feeds I care to index?
Hosting data !== indexing, again these are separate things.
Hosting your own data doesn't make you isolated — it is pretty much indistinguishable in the apps. You don't see where someone's data is hosted since in the app it all appears seamlessly aggregated.
Creating an app that only shows 0.000001% of the network's content when there's already an app that shows 100% of the same content is what I call isolating. I'm just not sure what it accomplishes since the network is still shared. So this isn't very compelling to most app builders. What's compelling is usually building completely new experiences. Although some people do experiment with more "limited" Bluesky clones.
Thanks for the patient explanation. It surprises me that an aggregator would simply start distributing from any server that announces it has content for that application. Moderation without false positives must be a beast.