Building a resilient trading platform for FinTech company

Reactive and reliable exchange system that can process large data volumes

    SHARE

About the project

The goal of the project was to implement a core engine for a digital asset exchange system.

Our team designed and implemented a set of services that communicate asynchronously thanks to extended use of Event Sourcing approach. This allows flexible recovery, auditing, and service integration. With Akka Cluster, Akka Persistence, Cassandra and Kafka we provided scalability on many levels. We also prepared advanced performance tests to ensure that the solution can handle large growth of transaction volume with time. Additionally, we built a rich monitoring setup around the new system, using Prometheus, Grafana, and Graylog.

The exchange platform not only continued to generate growing revenue to our customer, but gave them strong confidence that the system is ready for unexpected events and that it can process large volumes of data. Also, updating the platform with fixes and new features has become a safe, frequent action affecting only minimal scope, and keeping the rest up and running, so that trading could continue even during redeployments.

Team

  • Senior developers
  • DevOps engineer

Duration

  • 30 months

Team role

  • Senior Scala Engineer
  • Senior Architect
  • Senior DevOps

Industry

  • Fintech

Technology

  • Scala
  • Akka
  • Cassandra
  • Apache Kafka
  • Gatling
  • Grafana
  • Prometheus

Challenge

Initially, the system was doing its job very well, but only to a point when it became necessary to scale it. It was based only on the Akka Actor model with a few actors and their mutable state backed by some blocking calls to the infrastructure layer. It was impossible to put it in a multi-node environment without a heavy rework, requiring deep knowledge about the internals. Additionally, test coverage was very basic, which left no room for confidence in case of very invasive updates. The system required a strong division between handling commands - offer submissions, and queries - requests for secondary, derivative data (read model).

Primary drivers for starting a new project was to resolve current scalability problems in such a way that adding new markets and handling increasing traffic becomes a matter of spinning up new nodes in a cluster. Online trading systems attract both regular users who put their offers using a web interface, as well as advanced traders, who extensively use the direct API with bots. In case of exceptional events, markets can quickly become flooded with a heavy load of requests, and the system is expected to process them quickly and correctly. Therefore, the absolute top priority was to design the core trading engine for data consistency.

After successful deployment, our team continued to add new features and new microservices, benefitting from established architecture and communication patterns.

Technology used

  • #Scala
  • #Akka Persistance
  • #Apache Kafka
  • #Akka Cluster
  • #Grafana
  • #MariaDB
  • #Apache Cassandra
  • #Gatling
  • #Solr
  • #Prometheus
  • #Graylog

Every project is an adventure. Take another one with us!

Let’s dive into project together

How we faced client’s needs

Our engineers started with thorough study of the existing system. We began to implement end-to-end tests, which have been configured to run against both old and the new solution. In parallel, we took all the performance and reliability requirements and proposed an event-based architecture.

Main exchange core was based on Akka Cluster and Akka Persistence. With clustering, we could shard the business logic into markets, running on separate nodes and allowing great scalability. With clear segregation between commands and queries, we can write separate services for processing the core business logic of offer matching, and the definitions of so-called projections: pipelines for event processing and building the read model in various storage types.

Many types of projections were initially left to be just the same as in the original system, just now we were writing to these databases using projectors. Such separate projector services are Kafka consumers, so we can easily manage parallelization and separation which particular projections run on which nodes. This way we separated expensive and crucial projections from secondary ones - like history search views. Another set of projector nodes have been configured separately only to handle incoming queries. This way redeployments of projections or main engine didn’t affect requests for data.

Our DevOps engineers prepared a solid build pipeline with infrastructure defined as a code for all the environments like development, staging, and production. In the meantime, we connected Prometheus and Grafana for monitoring, as well as Graylog for log aggregation. Then we designed detailed performance tests using Gatling, to see how our setup could handle 10000 requests per second. When the initial version was ready, we could collect metrics from Grafana and Gatling itself to identify potential bottlenecks and see what are the platform capabilities. It turned out that we could easily handle thousands of messages with a few nodes, and adding more nodes to the Akka Cluster would increase the throughput without problems.

Finally, we defined a DSL which allowed us to write many acceptance tests for offer matching scenarios. With a large set of such tests passing for both old and new platforms, we were getting closer to the release. Last step required QA engineers to test all the remaining scenarios manually, and to do some safe tests on production, at least for the read-only part.

Results

We developed a performant reactive system allowing fast trading, and processing growing traffic with very high resilience in case of failures.

The exchange platform not only continued to generate growing revenue to our customer, but gave them strong confidence that the system is ready for unexpected events and that it can process large volumes. Also, updating the platform with fixes and new features has become a safe, frequent action affecting only minimal scope, and keeping the rest up and running, so that trading could continue even during redeployments.

We started to design new services, which have been implemented and deployed with equal success. Our engineers also worked on rewriting some of the legacy services in order to make them more reactive and match the architectural idea. The solution gives our customer a possibility to quickly react and as many new markets as they want, without worrying about technical capabilities.

New API users can join in great numbers, including large companies who connect to the API with high-frequency automatic trading algorithms.

Krzysztof Ciesielski
Senior Scala Engineer

"The solution gives our client a possibility to quickly react and add as many new markets as they want, without worrying about technical capabilities. It also accommodates a high volume of new API users, including large companies with automated trading algorithms."

Interested in the first-hand experience? Let us know and we will connect you with our clients!

connect me with your client

Other projects we've successfully delivered

    #Scala #Kubernetes #Java #AWS Cloud

Migrating SMS Geteway service to the AWS Cloud

A messaging gateway service that was running on a local data center and was no longer efficient.

Read more
    #Scala #Akka #Java #Akka

High-performance SMS broker

A high-performance, reliable SMS message queue, with nightly stress/performance tests.

Read more