Comment by nightpool

Comment by nightpool a day ago

7 replies

> It’s much closer spiritually to RSS and plain old web

What do you mean by this? ATProto requires a giant indexing database that has access to every post in the network. Mastodon is more like a feed reader—you only get notified about the posts you care about. How is needing a giant database that knows about every RSS feed in the world closer to the plain old web?

danabramov a day ago

>What do you mean by this?

RSS is a way to aggregate data from many sites into one place. AT lets you do the same, but with bells and whistles (the data is signed and typed, and there's a realtime stream in addition to pulling on demand). If you're forced to describe AT via existing technologies, AT is basically like RSS for typed JSON in Git over HTTP or WebSockets that scales to millions of users.

It is completely up to you what you decide to index. If you want to build an app that listens to records of "Bluesky post" type that are created only by people you follow, you absolutely can.

See https://bsky.app/profile/why.bsky.team/post/3m2fjnh5hpc2f (which runs locally and indexes posts relevant to you) and https://reddwarf.whey.party/ (which doesn't have a database at all and pulls data from original servers on demand + using https://constellation.microcosm.blue/ for some queries).

The reason you don't see more of these is because an isolated experience is... well, isolated. So people are less interested in running something like this compared to, say, a whole new AT app. But AT can scale down to Mastodon-like use cases too.

>ATProto requires a giant indexing database that has access to every post in the network.

Only if you want to index every post, i.e. if you want to run a full-scale social app for millions of users. As an app builder, you get to choose what you index.

For a start, you probably only want to store the records relevant to your app. For example, I doubt that Tangled (https://tangled.org/), which is an AT app, has a database with every Bluesky post. That seems absurd because Tangled is focused on a completely different use case — a social layer around Git. So Tangled only indexes records like "Tangled repo", "Tangled follow", "Tangled star", and so on.

Naturally, Tangled wants to index all posts related to Tangled — that's just how apps work. If you wanted to build a centralized app, you'd also want it to contain the whole database of what you want the app to show. This isn't specific to AT, that's just common sense—to be able to show every possible post on demand with aggregated information (such as like counts), you have to index that information, hit someone else's index, or fetch posts from the source (but then you won't know the aggregated like counts).

That said — if you want to build a copy of a specific app (like Bluesky) but filtered down to just the people you follow (with no global search, algorithmic feeds, etc), you absolutely can, as I've linked earlier. Or you can build something hybrid relying on global caches, or some other subset of the network (say, last 2 weeks of posts). How you do indexing is up to you. You're the developer here.

  • nightpool a day ago

    This would make sense if there weren't so many features—like Blocks, DMs, followers-only posts, etc—that were reliant on the AppView enforcing a single global view of the world. I agree that I do think the AT model does have good properties but right now too much of it is reliant on this single shared global app view

    But thanks for the link to Konbini! That looks really exciting and promising and I would love to start using it if I can run it completely decoupled from Bluesky infrastructure.

    • danabramov 19 hours ago

      I think it's only reliant on them to the extent that you want to build copies of the same exact experience, which I personally don't find very interesting. I think a much more compelling story is not, say, "a clone of Bluesky with a Bluesky DM folder", but, say, "a Spaces-like product that closely integrates with Bluesky (for posting) and is also listed as a stream on on Streamplace".

      I agree that some information seems important to know, like blocks. (Although in different apps it's reasonable to expect blocks to be app-specific.) Blocks are public on Bluesky though, for this exact reason. DMs are a disconnected service but the eventual idea is some kind of E2E (https://www.germnetwork.com/ is also building something now). Follower-only things could work through some variation of private state mechanism (see https://pfrazee.leaflet.pub/3lzhmtognls2q, https://pfrazee.leaflet.pub/3lzhui2zbxk2b).

      >I would love to start using it if I can run it completely decoupled from Bluesky infrastructure.

      You could use Blacksky's relay as the input source (https://atproto.africa/), or run your own relay. The only piece you'd then depend on is PLC registry (since it resolves PLC identity). Bluesky is in the process of separating it into a separate entity in Switzerland, but if that's a hard goal, I guess you could forbid `did:plc` identities in your app (vast majority of users) and only ingest data about `did:web` ones? Or do you feel OK about PLC resolution?

  • jazzyjackson a day ago

    > The reason you don't see more of these is because an isolated experience is... well, isolated.

    I don't understand why you become isolated once you've built your own app, it it because the bluesky firehouse has to decide to index posts I make on my server? I guess I'm asking how does an application decide which sources to index from, just anyone advertising that they are serving that lexicon? Why then would I become isolated by virtue of hosting only data I want to host/indexing only feeds I care to index?

    (Thanks in advance I do want to grok this...)

    • danabramov a day ago

      Hmm, no, that’s not what I meant. Let me try to break it down a bit.

      There’s really two main kinds of nodes in the system. Hosting servers and app servers. They’re completely unrelated and completely decoupled. It’s like Dropbox vs apps that put data in your Dropbox.

      A hosting server stores your personal data. This is similar to having a Git repository with data from all social apps. Or like a Dropbox folder. That’s usually called a “PDS” — a personal data server. Running one is extremely cheap since it’s only your data. It is also optional (eg Bluesky provides AT hosting for free). But this is not an app — it’s literally like Git hosting. Just the data (for all apps).

      Then you have app backends. Those are just normal servers. They’re what you’d typically think of web applications. The Bluesky app is one of them. An application server listens to events from all known hosting servers and updates its local database with whatever it’s interested in from the stream. For example, the Bluesky application server updates its local database to put all “post created”, “like created” etc events from all hosting servers into its database that it can query.

      So as an app author you have a lot of freedom for what to build:

      - You can build a new app that only listens to record of your app’s type. So naturally it would only index your app’s users’ content. Which is presumably not much.

      - You can take an app server for existing app (if it’s open source) and run it yourself. But then of course if this app has a million of users, you need to decide which records you want. Do you want to index them all (like the original app)? Do you want to index a subset? Which subset? It could be historical (eg two last weeks of post, one last week of likes etc). Or it could be by proximity (only profiles, posts and likes within one follow from you). Or something else. You decide what to store.

      - You can also build something hybrid — an app that remixes data from multiple apps. And you can fetch data from hosting servers without storing it (but this doesn’t give you aggregation) or fetch aggregated data from community indexes (if the aggregation you need already exists and is provided by someone else).

      Hope this makes sense.

      (As a performance optimization, instead of aggregating from millions of repositories individually, you’d listen to a stream that combines them. That’s called “relay”. Relays are mostly dumb websocket retransmitters and don’t have any app-specific logic. Bluesky runs one, Blacksky runs their own, and it would generally cost $30/mo to run one today. Any hosting server can ask any relay to crawl it. Any relay may also choose to crawl a new hosting server if it encounters links to content on that server. Relays are common infra and you shouldn’t expect there to be a lot of them. App servers choose which relay to listen to, if at all.)

      ---

      Now answering your specific questions:

      >I don't understand why you become isolated once you've built your own app

      If you've built an app that looks like Bluesky, but only you and your friends' posts/likes show up, is that much better than just using Bluesky? My point is that usually this isn't a differentiator and feels kind of pointless. You might as well just curate your Following feed on Bluesky. So people don't do that often.

      >it it because the bluesky firehouse has to decide to index posts I make on my server?

      This seems like a misconception; moving your data (to your own hosting) is a completely separate thing from creating an app. See the distinction above. You can move your hosting to a different hosting server, but this wouldn't affect your experience in the Bluesky app at all. The Bluesky application server would simply start ingesting your posts from your new server instead once it gets notified about your account move.

      >I guess I'm asking how does an application decide which sources to index from, just anyone advertising that they are serving that lexicon?

      Typically an application just listens to a relay (like the one hosted on Bluesky) which already retransmits events from all known repositories. If you operate your own repository, you can send a "request crawl" command to Bluesky's relay, and it will index you. This is kind of similar to a website getting picked up by Google search. Links may also do it but a "request crawl" is the explicit way. See https://pdsls.dev/jetstream?instance=wss%3A%2F%2Fjetstream1.... for a live feed of the relay operated by Bluesky (it's not specific to the Bluesky app).

      >Why then would I become isolated by virtue of hosting only data I want to host/indexing only feeds I care to index?

      Hosting data !== indexing, again these are separate things.

      Hosting your own data doesn't make you isolated — it is pretty much indistinguishable in the apps. You don't see where someone's data is hosted since in the app it all appears seamlessly aggregated.

      Creating an app that only shows 0.000001% of the network's content when there's already an app that shows 100% of the same content is what I call isolating. I'm just not sure what it accomplishes since the network is still shared. So this isn't very compelling to most app builders. What's compelling is usually building completely new experiences. Although some people do experiment with more "limited" Bluesky clones.

      • jazzyjackson 19 hours ago

        Thanks for the patient explanation. It surprises me that an aggregator would simply start distributing from any server that announces it has content for that application. Moderation without false positives must be a beast.

        • danabramov 9 hours ago

          The way I think about it, ingesting a stream of records from an arbitrary server is not any different to ingesting a series of <form> POST requests from someone’s computer. It doesn’t make moderation different.

          Moderation in AT is layered. Hosting servers do their own moderation but it’s very minimal (just trying to catch illegal content early). Relay operators also have levers to stop broadcasting from specific nodes if they’re problematic (but again, this is reserved for either extreme illegal content or for network abuse). Most of what you’d think as moderation happens at the app server level, which is the same as in non-AT apps. The app server can easily choose to not serve a certain user’s posts even if they exist upstream at their hosting.

          One wrinkle is that AT goes a step further and extracts moderation primitives (“labelers”) as a separate thing — for example, you can ingest Bluesky’s moderation decisions from a separate service (and the Bluesky app server listens to the same service). This makes moderation composable, and also lets someone make a fork of Bluesky that “listens” to a different moderation authority.