Things I wish I knew when I started with Event Sourcing - part 3, storage
In the previous installment of this series, we discussed data consistency considerations for event-sourced systems. We only touched on the topic of read-side data consistency, as that's where we most commonly spot and consider consistency implications. However, there's another critical aspect of event-sourced systems that deals with data: the write side, where events are persisted as a result of command handling. In this part, let’s briefly examine the possible event persistence options available, with consistency being one of the key factors.
What Constitutes an Event Store?
What is required from a database or other means of persistence to be considered a viable option for storing events in an event-sourced system? Undoubtedly, we need the storage to be durable. We also need it to have a proper transaction isolation level so that we can only read data that has been fully committed. Since we work with append-only logs, it would be beneficial if the system had reasonable performance for inserts (we don’t care much about the updates and deletes here).
Additionally, it needs to:
- provide atomic and sequential writes (when writing a series of events in one go),
- allow for ordered sequential reads at least in the context of a given aggregate (loading a sequence of all events related to a given aggregate),
- allow for ordered sequential reads globally so we can read all events in the order they were saved.
These requirements significantly limit our options to database solutions, as we need at least a substantial portion of ACID guarantees, plus some other features that databases provide (filtering, indexing, etc.)
There is an ongoing and somewhat heated debate on the internet about whether Kafka is a good fit for event sourcing storage (we even had two or three quite lengthy discussions with colleagues from SoftwareMill about that back in the day). While Kafka excels when it comes to events, my take is that it’s not particularly useful for the type of event “processing” needed here, but more on that later.
What is an Event?
From the storage perspective, an event in event-sourced systems can be thought of as a structured piece of data that has at least the following attributes:
- Aggregate ID: The aggregate that the given event is for.
- Sequence Number: The number within the context of the event stream for a given aggregate.
- Timestamp: When the event was recorded.
- Payload: The actual “content” of the event.
Some other nice-to-haves, but not strictly required, are:
- Global Sequence ID: Useful for system-wide projections.
- Tags: Also useful for system-wide projections, groupings, etc.
There may be other storage- or solution-specific fields, but at its core, this is all we need to be able to write an event to an event store and read it later when replaying the event stream for a given aggregate for command handling. How do these requirements fit into popular databases then?
Relational Databases
Unsurprisingly, relational databases are a pretty solid choice here. They usually meet the ACID requirements we need, with scalability and consistency that can sometimes be tuned when using more recently popular sharded/multi-node solutions for larger-scale problems.
Events are stored in a single table, optionally referencing the aggregate ID from another table, as embedding the aggregate ID in every single event may not be desirable. If it takes the form of a lengthy ID, it’s better to reference another table via a foreign key to save space. In some RDBMS, event tables can be partitioned by time or sharded (e.g., by aggregate ID ranges) for better long-term performance, especially on reads. Sharding by time is particularly useful if event stream snapshots are in use—we only need a few of the most recent events for replay.
Sequence numbers within aggregates are used both for ordering the events and for optimistic concurrency control to prevent concurrent writes (in case there is no single-writer principle applied at the application level, such as with Akka/Pekko Persistence). While sequence numbers within aggregate streams are probably better explicitly managed by the application side (due to optimistic concurrency control), global sequence numbers can usually be auto-generated from sequences. This has some drawbacks, though, as in Postgres, for example, incrementing sequence numbers is done in separate transactions, which may result in gaps or mixed order in the global sequence (e.g., events were assigned a sequence number, but the transaction for storing events was rolled back or failed for some reason). This introduces some complications on the read side.
Speaking of the read side, a drawback of RDBMSs is that they usually don’t have dedicated solutions for reading event streams for read models/subscriptions/projections. This isn’t surprising, as they weren’t designed specifically with this use case in mind. Some close-to-native solutions here are various flavors of CDC (Change Data Capture), either based on triggers or op-logs (Write-Ahead-Log with an additional logical decoding feature in the case of Postgres). These require quite a bit of coding on the read side to build the infrastructure to read, stream, and then process the events reliably.
Another option is good old polling for the events table, along with tracking the sequence number of the event. Depending on whether you want the read model for a single aggregate or globally (e.g., by tags), either the aggregate ID sequence number or the global sequence number can be tracked. However, there are dragons here, as the gaps and mixed order mentioned above may become issues. To ensure you don’t miss any event on the read side, some control and wait mechanism is needed—this is, for example, what Akka/Pekko Persistence Query does internally.
Yet another approach is to use Postgres' internal transaction IDs (xid) for global ordering. They have “better” semantics for our use case. For more details, I encourage you to read Oskar Dudycz’s post about this exact topic: Ordering in Postgres Outbox.
To wrap up the section about RDBMS, they’re pretty good on the write side and are still the most common choice, even for larger systems. The price you pay for ease of use and familiarity with the concepts is some additional work on the read side, but once set up, it usually works flawlessly.
NoSQL Databases
This is the next group of storage engines that can be used as event stores. The most popular ones are probably Cassandra (or ScyllaDB) and DynamoDB by AWS, which are column-oriented databases, and MongoDB, which is a document database. I only had the chance to use Cassandra as a backing storage for event sourcing for a short time (eventually migrating to Postgres), so what follows is more the result of my research on that topic rather than first-hand experience.
Document-like and schemaless models can be better suited for events as they allow for greater flexibility, depending on how you design your event payload (whether it’s just an application-managed blob of serialized data, as in RDBMS, or a first-class structure within a database document). In document databases (e.g., MongoDB), events can be stored either within one document (one document per aggregate) or as separate documents. Both options have their pros and cons; for example, the size of a single document may be limited, and in the case of significant aggregate growth, this may be an issue. On the other hand, single-document operations are naturally ACID-compliant. While MongoDB does support multi-document transactions, they may come with a performance penalty—there’s no free lunch.
MongoDB utilizes sharding, naturally allowing for the distribution of events across shards, thereby providing more room for handling a high volume of events.
Sequence numbers for single aggregate events need to be maintained by the application itself, similar to the RDBMS case, but a significant drawback of MongoDB appears to be the lack of built-in mechanisms to sequence events globally without additional logic and handling on your own. While MongoDB is built with a leader-follower architecture and all writes go through the leader, it should be possible to use a dedicated collection that acts as a sequence generator (though this likely involves multi-document transactions and, therefore, potential performance penalties). Unofficially, MongoDB maintains a stable order of insertion of documents, but from what I know, this is not documented, and it's not recommended to rely on it.
In column-oriented databases, like Cassandra, events can be stored as columns in a row where the partition key is an aggregate ID and the clustering key is this aggregate’s stream sequence number, which allows for ordering events within a single aggregate. Unfortunately, due to Cassandra's distributed and leaderless nature, there is no built-in solution for global sequencing/ordering of all events, so it has to be managed by the application itself. What is global ordering useful for? If you want to track events across multiple entities (e.g., to react on events of a given type only) you need something to track and checkpoint which events you’ve already seen and processed and to be able to resume from that given point.
Because Cassandra is distributed, conflicting writes can occur and are resolved using a “last-one-wins” strategy based on timestamps. This can sometimes lead to lost events. A solution to this is to adjust consistency levels accordingly, such as using QUORUM or ALL if you need strict consistency, but this naturally impacts latency.
On the other hand, due to its leaderless and distributed nature, Cassandra is an excellent choice for write-heavy, read-heavy (as long as the reads are key-based), and potentially geographically distributed applications that can afford to trade off some data consistency for availability.
For both MongoDB and Cassandra, I’m not aware of any native, built-in mechanisms that would allow for subscribing to event streams, so that part again has to be handled explicitly with polling or a CDC mechanisms (e.g. Debezium with Cassandra CDC or MongoBD replicaSet enabled).
One more thing about consistency: there is a great talk by Greg Young, the man who “invented” Event Sourcing, that touches on this exact topic—the consistency of the event store on the write side, especially in the context of geo-distributed systems. I highly recommend watching it.
Dedicated Event Stores
There are several products on the market that were created primarily for event sourcing use cases. These are databases dedicated to storing events as streams and querying these events from the read side. The most prominent one is probably EventStoreDB, created by Greg Young himself. Another one I’m aware of is Axon Server, which includes an event store as one of its core components. Axon Server is part of the Axon stack, an event sourcing solution for the Java ecosystem.
Generally, with these solutions, some (but not all) of the shortcomings of previously described approaches are addressed.
To me, the most important feature of EventStoreDB is built-in support for subscriptions and various queries on the read side, as well as the global ordering of events. Naturally, every stream has its IDs sequenced, and they’re guaranteed to be strictly monotonically increasing (without gaps). Additionally, all events are sequenced globally, which allows for reading from the so-called $all stream in the order events were saved. This global position of an event in the stream is built as a composite position consisting of “prepare position” and “commit position,” which ensures the correct ordering of events.
EventStoreDB also has a neat concept of “projections,” which basically means that you can build a new, intermediate stream consisting of events that match certain conditions (e.g., event type) and subscribe to this new stream with a catch-up subscription. With that, you can neatly organize and grow your read side by crafting projections and subscriptions the way you want.
On the other hand, EventStoreDB was built with a leader-follower architecture, so it doesn’t scale as well as, for example, the leaderless Cassandra—all writes still have to go through the leader, though replication to followers and read-only nodes certainly increases availability and fault tolerance.
In general, dedicated event stores are products that cannot be easily compared one-to-one with options like RDBMS or NoSQL databases. Due to their specific target, they employ novel approaches that are naturally missing in the former solutions, such as built-in subscriptions, categories, etc. Whether it’s a solution for your event sourcing problem really depends on the use case at hand, so as usual, use your judgment and do your research before going all-in.
What About Kafka Then?
Kafka is great at handling events, but it’s mostly in high-throughput, fault-tolerant event streaming where it excels. When it comes to working as an event store (in the way needed for event-sourced applications), it has some limitations that make it less suitable for that role. Let’s briefly go through some of these.
First of all, Kafka only guarantees the order of events within a single partition of a topic. This means that to guarantee the order of events within a single stream, you’d either have to create a partition per stream or a single partition topic per aggregate. Neither of these approaches scale well though.
A proper event store should allow for a quick replay of events for a given aggregate. While Kafka can be configured to keep events for a long time using its retention parameters, there is no way to efficiently cherry-pick events for a given aggregate from the entire stream, unless you spread the aggregates as mentioned above, which is impractical.
There is no way to build concurrency control when saving events on top of Kafka, which can compromise data consistency. While some conflicts can be detected and handled in the code, doing this for all possible cases is impractical. One option to mitigate this is to use the single-writer principle and handle concurrency control entirely at the application level.
These are just a few of the drawbacks of using Kafka as a true event store. There are certainly ways to work around some of these issues with additional tools and infrastructure, like ksqlDB, Kafka Streams, topics, and partitioning strategies, but I feel that the more cases you want to cover, the exponentially more complicated the setup becomes.
Don’t get me wrong, I like Kafka and use it a lot for data streaming and as a message broker. This is where it undoubtedly shines and remains a dominant player in the market. It’s just not a great tool for use as an event store.
Summary
We’ve reviewed some of the available solutions for storing events in an event-sourced system, from the most popular RDBMSs to highly scalable and distributed NoSQLs to dedicated event store databases. While each of these options has its strengths and weaknesses in terms of various aspects of event sourcing requirements, relational databases remain the most popular choice due to their popularity and flexibility. I secretly wish dedicated event stores were more popular and had a larger market share, as they can offer great solutions designed specifically for event-sourced applications.
Check other parts of the series:
Reviewed by: Marcin Baraniecki, Michał Matłoka