The architecture of Mastodon
While uncertainty surrounds Twitter, people are looking into alternative microblogging platforms. One such application is gaining a lot of popularity, Mastodon. The concept is similar to Twitter: we can post toots (short messages), which are visible to our followers; we can boost (retweet) posts or favorite (like) them.
However, there is a significant difference. While Twitter is a centralized platform, there are many independent Mastodon instances. Each Mastodon instance is a small Twitter with its own user base, administrators, and moderation rules. These instances are not isolated: they communicate and exchange data with each other, creating a federated network. Thanks to that, you can easily follow a person from another instance and see their posts in your feed.
From an IT perspective, let's take a high-level look at the architecture of a single Mastodon instance and how different instances communicate. Specifically, we'll be looking at how the Mastodon network-of-instances might scale.
I'm pretty new to Mastodon's codebase, so if you see any inaccuracies in the ongoing investigation below—please let me know!
Components of a single Mastodon instance
At its core, Mastodon's backend is an application written using Ruby on Rails: the same language and framework used by Twitter before being replaced by Scala and a number of custom libraries due to scalability issues. It's a monolith—you won't see any microservices here—but hey, monoliths can be successful as well—when was the last time you visited StackOverflow?
The second crucial component is a Redis server (an in-memory database). It is used in two roles: as a cache and as the data storage for Mastodon's job system. The job system does most of the interesting work and is powered by Sidekiq. Sidekiq advertises itself as Ruby's fastest job system, and given the number of jobs a large Mastodon instance can create, it certainly has an opportunity to prove its worth!
Optionally, ElasticSearch can be used to index and search for posts that you have authored, favorited, or been mentioned in. Moreover, there's a node.js server which handles streaming requests (delivered through a web socket or a long-lived HTTP request). Streaming includes local & federated timelines, as well as per-user notifications, home timeline and the like.
Finally, there's file storage (profile pictures, and media attached to posts), which can be either local (stored on the file system) or delegated to S3. When it comes to durability and backups, this should be an easy choice!
Data flow: creating a new post
When you create a post on Mastodon, it gets stored in the PostgreSQL database. Moreover, it is added to your local followers' home feeds, which are what you see when you open the Mastodon app.
The home feeds are stored in Redis. Here, Redis is used as a cache (the home feeds can be recomputed using tootctl
build, hence this data doesn't need to be persistent). Because Redis stores data in-memory, getting a home feed is very fast. But as memory isn't infinite, after a configurable period of time (by default, a week), home feeds of users who haven't logged in since will be removed from the cache.
But local followers are only half of the story; you probably will have some non-local followers as well on other servers. To propagate your post there, a Sidekiq job is created. That's where Redis is used again, in particular its queueing capabilities. A job to propagate the post is put on the
push queue for each server where you have followers. When executed, the job will perform the appropriate HTTP call.
On the other Mastodon instance, upon receiving a request, a job to create a copy of the post locally will be put onto the
ingress queue. Upon executing, the appropriate changes will be made in the PostgreSQL database and the Redis home feeds.
Hence creating a post means a local database mutation, plus as many network calls, as there are distinct servers throughout your followers.
Data flow: replies, boosts, and favorites
Each time somebody interacts with your post by creating a reply, boosting it (retweeting), or favoriting (liking), this needs to be propagated to other servers (where your followers are located).
If the interaction itself happens on another server than where the post was originally created, first we need to notify the origin server and only then perform the propagation.
Let's say you write a post, and your followers are distributed across 50 servers. As a result, 50 Sidekiq jobs are created on the originating server. Now, if somebody replies to the post, we've got 1 Sidekiq job to propagate it to the original (if needed), and then 50 more. If you get 10 replies, that's a total of over 500 jobs.
Scaling up a Mastodon instance
If your instance gets popular (or if you are running mastodon.social, fosstodon.org, or mas.to), you'll probably want to scale things up to handle the load. What are your options? (of course, the first thing is identifying bottlenecks, but let's assume you already did that).
Three components can be scaled. The first and most obvious one is the PostgreSQL database. While you won't be able to scale the write path (apart from vertical scaling of the PG server itself), you can scale the read path (which, as in many web applications, is the more frequent one) by providing hot read replicas. Note, however, that the Sidekiq jobs will need to perform both reads & writes from the master node, hence the scaled-up servers are only for the other clients (web, mobile, streaming).
The second component is the Redis in-memory database. Here, we can create separate instances that back Sidekiq jobs and need to be persistent (so that we don't lose, or at least minimize losses of jobs that propagate posts and interactions between servers, in case of failure). On the other hand, the Redis instances, which are caching home feeds, can be volatile and don't need persistence.
Redis itself can be scaled using either Redis Sentinel or Redis Cluster. For Sidekiq, only the Sentinel option is viable, as Sidekiq uses a small number of frequently updated keys. With Sentinel, we get fail-over, but we won't increase the server's throughput. For the home feed caches, we might use Redis Cluster, which will distribute the many cache keyes across available nodes.
Finally, we've got the file storage. But here, if we want to scale, let's just move to S3, delegating the problem to somebody else.
So far, we've covered mostly our options regarding horizontal scaling. However, there's a lot of fine-tuning of the Mastodon application and the middleware it uses itself. For example, Sidekiq has a configurable number of threads which process jobs from the queue. At some point it makes sense to create dedicated Sidekiq processes, where each process prioritizes jobs from a certain queue. And, of course, PostgreSQL exposes quite a number of tuning parameters as well.
Scaling the federated network
We can scale a single Mastodon instance: mostly vertically, by adding more RAM to the Redis instances, so that feeds for more users can be cached. Or by adding more RAM to PostgreSQL instances so that more queries can be handled in-memory. There are some horizontal scaling options, such as adding more web servers or adding hot streaming PostgreSQL replicas.
But at some point, we'll hit a limit where a single server can scale no more. That's what has happened to mastodon.social—it no longer accepts new users. What to do, then?
Mastodon has an answer: federation. It's not a centralized network, instead, the load can spread across multiple Mastodon instances. Each such instance is a small world in itself. And the more instance-local traffic there is, the better.
However, this federated solution might run into problems once we get more and more inter-node relationships. That is when people from one instance start following people from other instances. In that case, data has to be copied from one server to another. And this traffic gets multiplied by the number of interactions.
The traffic generated by "regular" users, who have a reasonable number of followers, and get a reasonable number of interactions, isn't a problem. But then there are users who get a huge number of followers and whose posts spark many interactions. Let's imagine a person who has 500k followers spread across, let's say, 10k servers. Each post that such a person sends creates 10k Sidekiq jobs. Each interaction—another 10k Sidekiq jobs. If the post gets popular and gets 1k favorites (that's 1/500 of the total followership of that person) in a short period of time, we're talking about 10M Sidekiq jobs. And all of that needs to be coordinated by one server.
We have yet to see things fold out in practice, but it has caused trouble before, so I suspect we might see much more of these problems. One or two such celebrities are probably not a problem. But Twitter is relatively full of celebrities with large follower counts. It doesn't seem that the current design is ready for such loads.
Will all posts eventually end up on mastodon.social?
Another problem is that with big servers, such as mastodon.social. Given that it gathers a large group of people with rather disparate interests (it's a "general-purpose" Mastodon server, not serving any particular community), won't a majority of posts from other servers eventually end up being copied there? All that it takes is a single follower for a post having to be propagated.
Even assuming optimistically that the other Mastodon instances will host mostly self-contained communities, where the majority of traffic will be intra-node, only pushes the problem a bit into the future. Let's suppose 1/10th of the posts end up on the largest node. That's still a huge amount of data if Mastodon were to become a Twitter replacement!
Mastodon is a testament to the fact that you can get quite far with a traditional webapp. There's no Kafka or Pulsar, Cassandra or Mongo, and no need for distributed consensus or cluster coordination. However, such a design might hit its limits. But the app is behaving well now, and the federated network is working nicely. The large influx of users might be a great impulse to improve how content is propagated, provide better support for celebrity users, and improve scaling.
There are also social or user-experience obstacles to be tackled. For example, will people want to take the extra step of having to choose an instance instead of "just joining" and getting the posts you're interested in? Hard choices might be discouraging! There are also long-unsolved technical issues, such as that of backfilling feeds of new users.