Reliable and scalable high-load platform

How we improved the platform's performance, implemented critical features, and optimized its infrastructure.

    SHARE

About the project

COTA Healthcare is a company founded by doctors, engineers, and data scientists. They combine oncology expertise with advanced technology analytics to organize and analyze cancer data to help with advanced patient care and research.

COTA Healthcare was developing a platform called Cota Abstraction Platform (CAP) to gather and process oncological data.

By the end of 2019, SoftwareMill joined the effort to support the rapid growth of the platform. Initially, our team consisted of two developers who integrated with one of the client's agile development teams. As the project progressed, the number of SoftwareMill developers involved in the project increased, and at its peak, there were six engineers working with COTA. We covered Backend, Frontend, and DevOps domains.

Throughout 2019 the client acknowledged how impressed they were with our commitment, execution, and expertise, so in 2020 they decided to form a self-organized team made up exclusively of our engineers. The team was operating in the agile methodology with the client’s Product Owner and Scrum Master. The team acquired responsibility for some modules that were part of the platform. It also helped outside of their scope on demand.

Team

  • 3 - 6 devs

Duration

  • 3+ years

Team role

  • Senior Scala Engineer
  • Senior Angular Engineer

Industry

  • MedTech

Technology

  • Google Cloud
  • Kubernetes
  • Grafana
  • PostgreSQL
  • Scala
  • Cats + Cats-Effect
  • Elixir
  • Angular
  • TypeScript

Challenge

The healthcare industry is intricate and highly regulated, presenting several challenges to healthcare IT systems. One significant challenge is the rapid evolution of medical technology and treatment options, necessitating IT systems to be adaptable and flexible. Sustaining a rapid pace requires trade-offs causing gradual growth of technical debt. A codebase that was written with time-to-market priority in mind starts to accrue maintenance costs over time.

Projects in the early phases frequently start as a single code base, monolithic application. With a platform's rapid growth, they become more complex and hard to maintain. Compilation times become longer, the developer feedback loop while testing slows down, and high component coupling increases bundle sizes.

The multi-team collaboration on a single codebase is prone to quality and consistency degradation. Even simple discrepancies between developers’ formatting styles might lead to frustrating and time-consuming version control conflicts.

A well-established solution for complexity is splitting into finer-grained services or modules. However, keeping system consistency is more challenging in a distributed environment because of many factors, like the introduction of code duplication, and the loss of ACID guarantees.

One of the major data-centric apps' pain points is performance decline as data volume increases. Complex queries on data from distributed sources can cause high database stress and even timeouts. The high amount of information and its presentation in the user interface may lead to bottlenecks and performance issues resulting in a poor user experience.

Data is an invaluable asset. It is crucial to allow end users to export, analyze, and conveniently create reports. The plain CSV exports are a good starting point for more sophisticated solutions.

An essential aspect of a long-term project's maintenance is keeping a technology stack up-to-date. An outdated library might pose a security risk, slow development pace, and system performance or inflict compatibility issues with newer technologies.

Last but not least is a proper system monitoring implementation. It allows for swift detection of errors and performance degradations.

Technology used

  • #Scala
  • #Cats + Cats Effect
  • #Grafana
  • #Google Cloud
  • #TypeScript
  • #Elixir
  • #Angular
  • #Kubernetes
  • #RxJS
  • #CodeceptJS
  • #Angular Material

Every project is an adventure. Take another one with us!

Let’s dive into project together

Client's needs

One of the team's responsibilities was participation in the design of the architecture of the CAP system. Our goal was the seamless introduction of new features while retaining the stability and reliability of the project.

We analyzed client needs and potential threats (like performance bottlenecks), to find the best-fitting software solutions.

We were preparing various design-related documents, from Design Docs to UI mockups. We recorded important decisions with ADR (architectural decision records). Our comprehensive documentation and cross-functional code reviews allowed for smooth knowledge transfer between stakeholders.

The initial part of the platform that we started developing was tightly coupled to other parts of the system. It’s been decided to extract it into a dedicated Scala service with its own CI/CD pipeline. This has greatly improved compile times and shortened the developer feedback loop. Its frontend counterpart was also separated into its own lazy-loaded module, and the project structure has also been reorganized for clarity purposes. Every new feature module we developed followed the same pattern.

To mitigate the network's non-deterministic nature and retain data consistency across distributed services, we utilized patterns like transactional outbox and inbox.

Management and analytical processing of the comprehensive CAP data model were demanding, mainly because of the high complexity of the healthcare domain. Apart from complexity, another factor was the large volume of data.

Sometimes the above led to a slow or unresponsive user interface. Those cases were looked into, profiled, and optimized via various techniques such as adjusting change detection strategies, dynamic loading, and rendering of the components.

On the backend, our initial solution of utilizing the power of relational database engines via complicated queries was not scaling well. Thus we were encountering unpredictable performance drops and slow response times. Our answer for that problem was creating a denormalized model optimized just for queries (read model) that was derived from original data. It allowed us to decrease the database load and improve latency. It was updated in near real-time, a feature that we couldn't get with only database materialized views. See "Picture 1" under this text section.

Some operations are expected to meet strict time boundaries, whereas others are accepted to take a substantial amount of time to complete. To overcome problems with long-running, complex actions and potential timeouts, we created a dedicated job mechanism. It allowed for asynchronous job execution with the ability to query for its status at any given time. See "Picture 2" under this text section.

We addressed the need for durable data exports and reporting by leveraging Google Spreadsheets. This made perfect sense as COTA was already using Google products, and it was very convenient and secure to store files on the company’s shared drive where access for each employee could be configured as desired.

Previously it was a two-step process - first, the user had to export a CSV file and then upload and process it in Google Spreadsheets. Exporting directly to a spreadsheet not only allowed them to skip an extra step of CSV export but also made the outcome closer to what users expected to get with all the formatting, data nesting, and multiple sheets support.
The users received it well, and the functionality to export arbitrary data to spreadsheets was added in many places.

When developing independent services, it’s critical to not only test each one individually but also how they integrate as a whole system. In order to reduce the risk of releasing a breaking change to the production environment, we introduced a set of end-to-end tests that verified all the components work well together. Such tests to be trustworthy must be performed against an environment that is as close as possible to the actual production environment. Because we leveraged Kubernetes, a ubiquitous abstraction to define how services are deployed, it was possible to spawn production-imitating services during a build pipeline and run tests against these. The introduction of E2E tests not only improved developers' confidence when deploying a new version but, more importantly, protected the system from unwanted disruption and downtime. Although very beneficial to have, they also impose a cost of maintenance, therefore this kind of testing was limited to the most critical functionalities.

Our paramount goal was to migrate CAP to use the newest available versions of language, libraries, and frameworks.
To tackle the problem in the long run, we introduced semi-automatic processes - bots that were creating update proposals (pull requests with adjusted dependencies). We used scala-steward for services written in Scala and dependabot for those written in Typescript. A reliable CI/CD pipeline and high test coverage gave us the confidence that we could integrate those updates into the system without breaking it.

To increase type safety but also to make the developer experience consistent, we introduced dedicated tools. We enabled strict typing rules in Typescript and Angular templates and configured recommended es-linting ruleset and prettier code-formatter. Likewise, we set up linting and formatting tools on the backend (like wart-remover or scalafmt).

To swiftly react to the system's functionality degradation, we measured essential metrics (like response latency) both in the production environment and during performance tests.

Results

Our team's successful and ongoing collaboration with the client has resulted in numerous critical features being implemented on their data platform. The overall platform performance has been significantly improved, with the ability to gracefully handle larger volumes of data without any timeouts. Since 2019, we have additionally enhanced the overall developer experience by incorporating various tools and practices to enhance code quality, reduce build and deployment times, and increase productivity.

Furthermore, we have demonstrated our commitment to knowledge sharing by providing design documentation, conducting code reviews, and offering direct consultations to stakeholders and other developers.

The trust the client placed in us when they designated our developers-only team has paid off. Our autonomous team worked seamlessly with COTA’a product owners, delivering continuous business value.

Sudhakar Velamoor
VP of Engineering at COTA Healthcare

"In the world of software engineering consulting firms, it’s hard to find resources who are committed to your product as much as you are. It’s even harder to match the specific skills needed when such skills (functional programming, Scala) are in very high demand. If you add the independent and autonomous operation of the team, that’s a trinity almost unheard of. You guys gave us all of the above, and we wish you the very best, and thank you for all the great work! Thank you!"

Meng Mao
Data Architect at Cota Healthcare

"SoftwareMill's productivity, code ownership, and dedication to engineering quality have completely redefined my expectations for what a contracting company is capable of. Thank you for being a big part of our engineering efforts, and I hope to work with any of you again soon."

Brett Riotto
Director of Engineering at COTA Healthcare

"You are the utmost professionals at what you do, and you really have helped us create a solid, maintainable platform. I really appreciate the level of commitment that you showed to our project. You all always met deadlines, always pitched in when there were production issues, and the pride in your work always showed. I know the future holds great things for you all, and I hope to cross paths with you again. Please let's stay in touch."

Interested in the first-hand experience? Let us know and we will connect you with our clients!

connect me with your client