How to communicate Java microservices?
Managing a sizeable software system is always a challenge. The proper solution for taming the complexity is isolating its modules into independent services.
The microservices architecture has gained wide acceptance in the IT industry over the years. And according to last year’s survey by O'Reilly, most of the adoptions were at least partially successful.
Choosing microservices might be beneficial for the growth of your system, still, using them comes with its difficulties. Things that are straightforward in monolithic applications are not that easy with microservices. Sharing information across modules can no longer be achieved by a simple local function call. Microservices usually reside on separate machines (physical or virtual) and moving data requires a network call. You can read about the pros and cons of microservices in an article by Maria Wachal.
If your system is supposed to have satisfactory performance and resilience, you have to ensure that it's not affected by the flaws of remote communication like unpredictable latency or poor reliability.
In this article, I will share my thoughts on the best communication patterns for microservices. I will be mainly focusing on Java and Spring.
Sync and async
There are many ways we could categorize communication patterns between services. Arguably the most popular method is specifying if the type of communication is synchronous or asynchronous.
A synchronous call means that a service waits for the response after performing a request. It doesn't necessarily mean blocking a thread. The library making the call can invoke a callback when the response arrives or return a pending handle like CompletableFuture or reactive Flux.
Asynchronous communication usually involves some kind of messaging system like RabbitMQ or Apache Kafka. To initiate communication, a service publishes a message to the message broker. After that, the client can get the message from the broker. A relevant point here is that there, the sender doesn't need to wait for the response. It might be sent back from the receiver later as another asynchronous message.
You can learn more about general interaction patterns between microservices in an article by Michał Matłoka:
Getting data from other services
The most obvious way to get data payload from another service is an HTTP call. By using REST, the service exposes its internal state as resources. These resources can be fetched using a GET method or modified by HTTP methods like POST, PATCH, PUT or DELETE.
There are many capable REST clients in Java we can choose from. One interesting option is a new HTTP client called
WebClient introduced in Spring 5. In contrast to its precursor soon-to-be-deprecated
RestTemplate, it offers modern non-blocking and reactive API. Under the hood, it's based on
HttpClient introduced in Java 11.
WebClient webClient = WebClient.create("http://localhost:8080"); String response = webClient.get() .uri("/data") .retrieve() .bodyToMono(String.class);
You can read more on Spring’s WebClient here.
Making calls resilient
Using HTTP for synchronous communication seems easy, but there are some traps there.
In case the target remote service is unavailable, all requests will fail. If the requested data is required by the client's process, the failure would prevent it from advancing further. This way the flop of one service causes a cascading failure on another.
This means that these services are tightly coupled. This kind of coupling is often called temporal, or time-based, meaning that the requesting service expects the response as soon as possible to be able to continue with its computations.
The failing request doesn't necessarily mean that the remote service is permanently down. It might be just restarting or increased network load might have caused the timeout. A repeated attempt after a short while might turn out to be successful.
That's why you can improve the robustness of your service’s calls by utilizing a retry pattern. Many Java libraries offer implementations for retry. For instance, the WebClient has a built-in operator, just for this circumstance:
webClient.get() .uri("http://localhost:8080") .retrieve() .bodyToMono(String.class) .retry(3);
Does retry always do the job?
So far, I described only retrying GET requests. But what about requests that modify data like POST or PUT?
In case of error, from the client’s perspective, it might be sometimes impossible to tell if the request reached the server. Maybe the request succeeded, but only the response timed out? Even if the server returns an error code, how can you be sure if no resource was modified?
For that reason only retrying GET requests is considered a safe operation (as long as the practice of GETs being read-only is followed).
On the other hand, retrying POSTs is a bad practice since it might cause a duplicate action on the target server.
PUTs or DELETEs by convention should be idempotent, so in theory, even if the data is processed twice, it should make no difference. But if that is the case depends solely on implementation.
So how to ensure that no modification is lost? For commands (requests that modify data), it's usually better to use async messaging. I’ll tackle this topic later.
Yet another solution is to query the target service to check if the action succeeded or failed (for example by querying some GET endpoint to get the state of resource) before retrying. This approach should be only used in case applying async messaging is not viable.
You might also be interested in:
It’s never a good idea to wait for the response from the remote server forever. It might even never respond!
Request timeout defines how long you intend to wait for a reply. For some requests, it’s okay to hold a little bit longer, but for others, no delays should be tolerated. What is acceptable depends solely on the context.
By default, Spring’s WebClient will terminate the request after 30 seconds. The consequence is that in the worst-case scenario, the failure or attempt to repeat the call will be delayed by 30 seconds.
Spring allows fine-grained control for declaring timeouts for particular requests, but you can define them globally by setting a property
spring.mvc.async.request-timeout. You can read more on configuring timeouts in Spring in this article.
Again, there’s also an adequate component from Resilience4j - TimeLimiter.
The service that responds unusually slow is likely in bad condition. It can be running out of resources trying to handle an unexpected surge of requests. Such service might be able to get back on its feet if given some time.
The circuit breaker pattern makes sure to slow down the rate of requests sent for services that are suspected of being at the brink of collapse.
You can read more on circuit breakers in this article.
Once more, a suitable implementation is provided by Resilience4j.
Retrying the calls is a very effective technique while dealing with the transient unavailability of other services, but it can't help if another service is down for a longer time. Sometimes other service’s downtime can be predicted (like scheduled maintenance) but very often it's completely random (like errors caused by hardware malfunction).
A practical way to protect your services from errors affecting other microservice is caching. If strong consistency of data is not required (it's acceptable the record was modified in another service but the cache has a stale version of it), your microservice can get data directly from the cache and only fetch it when the cache is invalidated (for example when TTL passed).
If it's preferred to operate on the newest version of the record, you can treat the cache as a backup used only if the remote microservice is unavailable.
You can also make your caching mechanism smarter with the usage of tools like ETags.
ETag is an HTTP header used to identify a specific version of a resource, which is sent to the client along with a response. When data is cached, it should save the value of ETag.
Next time the record is required, your client must send an ETag along with the request in the header If-None-Match. The server uses the ETag sent by the client to determine if data has changed. If it wasn't modified, it responds with an empty body and status code 304 Not Modified. The client then knows it is safe to use cached data. If ETag was stale, the server sends a regular response with data.
With this approach, data is re-fetched only if it did change.
Another way to coordinate caching behaviour between the client and the server is the use of cache directives. By returning the proper header, the server can instruct the client exactly how long the cache should be kept and when it should be invalidated.
You can also handle errors of other microservices by temporarily turning off some features of the application that depend on the synchronous call to service that is down.
Usually, partial unavailability of some components is much better than a full-blown crash of the whole system.
Let's take an example of an online library system. If the microservice responsible for handling book lending is offline, but the one allowing searching through book catalogues is online, you can allow users to search for a book, but display the error message in case lending is attempted.
Documenting your API
Maintenance of every microservice is usually a task of a small, independent team. One team could be taking care of a handful of services, nevertheless, it is rarely possible to know all nuances of the system’s architecture. Integrating a microservice with another part of the system that’s maintained by some other team might be a real hassle if no documentation is available.
Providing a comprehensive description of the microservice’s public API makes the job of other teams less complicated and the process of integration becomes more natural.
A very popular way to document REST API is OpenAPI (formerly known as Swagger).
Moreover, with OpenAPI metadata that’s describing your service API, you can use tools that generate client stubs that can directly discover and consume your services.
Versioning the API
An important feat of microservice architecture is the ability to independently deploy every service. But how to handle a situation when you need to introduce a breaking change?
An example of the change that’s breaking might be altering the format of the response or removing any part of the API. Changing the API in place would require clients to do coordinated deployment with the service to adjust to modifications (and that might be a scent of distributed monolith).
The recommended solution is to keep multiple variants of API (versions) simultaneously. After all the client services successfully migrate to a new version, the old one can be dropped.
You can read more on API versioning in this article.
Securing your API
Very often overlooked aspect of communication between services is security and encryption.
One way of thinking about internal communication of microservices is assuming that traffic that does not go outside the private network is safe. In this case, TLS encryption termination can happen on the entry gateway to our system.
Many consider TLS termination as a very risky practice.
If your private network was compromised, your services will be very sensitive to man-in-the-middle attacks. However, encrypting all the traffic comes with a cost of increased CPU resource consumption and slightly increased latency. Nevertheless, it is almost always worth securing all your traffic.
Although it's possible to use one-way TLS, when only the server has to provide a TLS certificate to the client, the best practice is to set up mutual TLS encryption when both sides have to authenticate each other.
TLS encryption can be implemented on the application level. With Spring Boot, it’s just a matter of setting up a keystore and adding proper configuration entries.
A quite different solution for the issue of traffic encryption might be moving the problem higher into the stack to the infrastructure level. If you're deploying your app on Kubernetes, you can use service mesh like Istio or Linkerd to set up two-way TLS encryption between nodes of your system.
On the other hand, putting up and managing service mesh along the Kubernetes cluster can be a complicated, full-time job for some members of the team.
GraphQL is a flexible alternative for REST.
What's different with REST, instead of exposing a hierarchy of URLs for fetching and manipulating resources, GraphQL usually provides a single endpoint under /graphql.
That endpoint takes a payload in JSON-based query language.
Using this query language, the client can specify which resources should be modified or included in the response.
GraphQL is a relatively new technology. It was initially released in 2015 and reached a stable version in 2018. It fits best in cases when you'd need a plethora of very similar REST endpoints.
GraphQL integrates easily with SpringBoot. You can get more details in the article by Jarosław Kijanowski or GraphQL introductory series:
As long as your system consists of only a few nodes, you can always hardcode their IP addresses in each service’s configuration. With every new microservice, this approach will become more and more difficult to control.
Automatic resolving of other service addresses is called service discovery.
Service discovery can be DNS based. Notable examples of this method are services from Kubernetes or ECS Service Discovery from AWS.
For instance, Kubernetes creates and maintains a custom DNS record that resolves to one or more IP addresses of the service's instances. When the client needs to perform a request, it can just open a connection using this DNS name.
Alternatively, service discovery can be API based. An important piece here is the registry that manages the list of all available instances with their addresses. The service registry is the only ‘fixed point' in such an architecture.
All the other services register their IP address on the startup and de-register on graceful shutdown in the registry.
Before doing a first remote call, each microservice needs to get the list of IPs of other services from the registry. These IP addresses can be then cached for the configured period for further use.
Using remote procedure calls has several advantages over regular HTTP calls. It uses binary encoding, which has a lot more size overhead than JSON. Also, serialization is much faster.
As gRPC is language-agnostic, you can generate code for the client code for various languages and frameworks (including Java). You just need protobuf files that describe data structures returned and accepted by your methods.
Since the third version of Protocol Buffers, all fields defined in .proto files are optional. Because of that, adding or removing a field won't make deserialization fail in clients with stale schema.
That characteristic of proto is intended to make schema migrations more manageable.
Using gRPC in some of your endpoints doesn't mean that all your services would need to use them as well. With gRPC gateway, it's possible to integrate JSON-based REST API with gRPC.
The synchronous call is the simplest way to communicate two services. It also bonds them together, since the calling microservice needs to wait for a response from remote. This kind of coupling can sometimes be prevented by using asynchronous communication.
Shortcomings of synchronous communication described in the section above can be very often avoided by switching to asynchronous protocols.
In async messaging-based communication, a separate component known as a message broker is responsible for handling the message sent by the producer service.
After the message is received by the broker, it’s now its job to pass the message to the target service. If the recipient is down at the moment, the broker might be configured to retry as long as necessary for successful delivery.
These messages can be persisted if required or stored only in memory. In the latter case, they will be lost when the broker is restarted and they are not yet sent to the consumer.
Since the broker is responsible for delivering the message, it’s no longer necessary for both services to be up for successful communication. Thus async messaging mitigates the biggest problem of synchronous communication - coupling.
Point-to-point vs Pub/Sub
A messaging system can deliver messages point-to-point (from a single producing service to a single recipient) or with publish/subscribe pattern when a message is stored in a logical group usually called topic and then can be received by all consumers subscribing to that topic.
Point-to-point systems are called messaging queues. Commonly after the message successfully reaches the consumer, it’s deleted from the queue. If the target service is down, the message will remain in the queue until the receiver is up and consumes the message. A popular choice for the queueing system is RabbitMQ, ActiveMQ or SQS from AWS.
In publisher-subscriber systems, the broker stores the messages in the topic. Consumers then can subscribe to that topic to receive messages. Unlike in queue systems, messages are available to all subscribers and the topic can have more than one subscriber. The message remains persistent in a topic until they are deleted after some conditions are met (like the expiration date of the message passes). The most notable examples of Pub/Sub systems are Kafka, Pulsar or Amazon SNS (some Pub/Sub brokers, like Pulsar, can also be used as queues).
What if the message broker is down?
A message broker is a vital part of the asynchronous architecture and ensuring it’s available requires extra care. This can be achieved by setting up additional standby replicas that can do failover. Still, even with auxiliary replicas, failures of the messaging system might happen from time to time.
What might pose a problem is that requests from the producer will fail if the broker is down. Many clients have a built-in mechanism for retrying requests (like Kafka or RabbitMQ producers), but the message will still be lost if the producer restarts.
Sending a message before committing is not always the right solution because it’s never guaranteed that transaction will succeed. This may cause an inconsistency: the message is delivered to the consumer, but the producer transaction is rolled back.
As an answer to these problems, you might consider the transactional outbox. With that pattern, the message is first saved to the database table (as part of the transaction) and only then scheduled task attempts to deliver it to the broker. Only after the broker confirms it received the message, it’s marked as sent.
After the message is delivered to the broker, it is its job to hand it over to consuming services.
In the simplest case, the broker passes the message to the service and doesn’t expect any confirmation that the message was received and successfully processed.
This kind of delivery guarantee is called at-most-once. The broker will never attempt to retry the delivery of the message. This approach is only applicable if some of the messages might be lost. An example of such messages could be telemetry readings sent in regular intervals.
If it’s essential to ensure the message arrives at its destination, a broker might be configured to work in at-least-once mode. After the message reaches the consumer, it needs to send back ACK to the broker. If no acknowledgement gets to the broker, it will retry the delivery after some time.
Even if ACK is sent by the consumer, it might be never received by the broker and it will send the message again. Thus the name at-least-once delivery.
From the technical perspective, exactly-once delivery (guaranteeing that a single produced message is always received only once by the client) is practically impossible to achieve. Since messages can be received by clients more than once, it’s important to implement countermeasures like message deduplication or ensure processing idempotence.
By correctly applying these principles we can achieve effectively-once processing semantics - very close to the ideal exactly-once.
Choosing the right messaging systems
Even though similar at first glance, every messaging system comes with a unique set of features. A decision on which messaging system is right for you might be based on many premises.
If you need a Publish/Subscribe system, you might consider Kafka, Pulsar or cloud-based solutions like Google Pub/Sub or Amazon SNS. For queues, there is RabbitMQ, ActiveMQ or SQS from AWS.
Some systems like Kafka or Pulsar by design store the messages in persistent stores. Queues like RabbitMQ by default delete the messages as soon as the consumer confirms it was delivered.
If the performance matters the most, you might be interested in checking the evaluation of messaging systems:
Another important factor might be the ease of integration with your framework. The rule of thumb is that the more popular the messaging system is, the more refined integration will be.
On the other hand, maybe it might be worth pioneering usage of a cutting-edge messaging system offering an unusual set of features or superior performance?
Hidden costs of messaging systems
An important point to remember is that a messaging service is another part of the infrastructure you have to maintain. Synchronous communication usually requires no additional components. Whether it be a JMS server or Kafka cluster, setting it up will still require gaining know-how in your organization. If you don't feel ready to handle your own messaging cluster, you can always choose managed service from one of plenty of cloud providers (like SQS or MSK from Amazon).
The drawback is that depending on the complexity of your systems and the number of messages you produce, the cost of using cloud messaging services can be substantial.
Although billing usually is predictable, it still might add a lot to your cloud budget.
This is offset by decreased spending on on-premise resources and less team time spent on maintaining infrastructure.
It’s not always possible to replace synchronous calls with async messaging, but very often it might be worthwhile. Asynchronous microservice integration is also preventing unwanted coupling. Because other services’ data can’t be requested on demand, it promotes developing services as autonomous units.
Want to know more about Amazon's SQS performance and latency? Read this article by Adam Warski.
Streaming your data
Another way for implementing asynchronous communication between services is streaming.
It is especially useful if requested data is not available right away. Messages can be sent immediately as they become available.
Some streaming protocols like Websockets also allow bidirectional communication, which might be very handy in case two services need to exchange messages. Web sockets communication also allows decreasing the additional overhead of sending HTTP headers with every request.
Directly streaming data between services might be a tempting alternative to asynchronous messaging with brokers. The thing to keep in mind is that protocols like Websockets are low-level and have no notion about concepts like delivery guaranties. You'd have to implement all of these on the application level.
Choosing the right way to communicate two services might be a real challenge.
There are many considerations to keep in mind. Is synchronous coupling required? Or would asynchronous communication work better here? Are delays acceptable? Can data be cached?
I hope I shed some light on how to tackle the most common issues.
Remember, you don’t need to make a single choice for communication patterns between all of your services. It’s usually much better to adopt various, carefully selected styles for particular needs.