atproto and bluesky

consuming the firehose for less than $2.50/mo*

2024-11-13 by phil (they/them)

work in progress

It's fun to play with data[citation needed]. All data on Bluesky is extremely public, and with 15 million users (as of today and with mind-boggling growth), there's a lot of public data to play with.

You can get the firehose as a websocket JSON feed with Jetstream. This connects you to everything happening on the network in real time. It's extremely easy to get started and very fun.

Here's a random word from every post being posted right now:

There you go, you just consumed it for free from your browser.

little app

After noodling around a bit, i sketched out this questionable little app which listens for all delete events from the firehose, and then shows the just-deleted text one last time in an anonymized disappearing feed. Kind of fascinating to see what people choose to delete. (also please note that I made this millions of users ago when the network was a fraction of the current chaotic speed). Maybe I'll write more about it later.

Anyway it runs on fly:

screenshot of fly.io dashboard showing app metrics over 7 days. network io has a daily cycle (growing since bsky is growing), cpu memory and load average are all flat, volume usage grows (backfill), then levels (steady state), then grows again (bsky growing).

Granted it's not doing much, but it's doing it happily on the smallest instance fly offers.

What is it doing?

  1. Receive every new create-post event from the app.bsky.feed.post collection
  2. Filter out empty posts, apply redactions to mentions and links
  3. Cache the clean text content on the volume in pebbledb, keyed by did+rkey
  4. Every few seconds, do a range-delete on saved texts older than 48hr
  5. For every delete-post event, try to fetch the text from the cache, and if found, broadcast it to all current observers

More interesting apps might not scale down this far, but some will.