Good practices for schema evolution with Protobuf using ScalaPB and fs2grpc
Schema-driven paradigm has become very popular in different communication models. Defining APIs using Interface Description Languages allows evolving schema more robustly and safely than with JSON-based messages. Depending on various factors, adding/removing message fields can be perfectly safe or break binary compatibility after a deployment. You might need different policies for schema evolution when using gRPC services, event sourcing, or Akka Cluster actor communication, not to mention that the decision may be different for specific wire protocol. Let’s look at various communication modes and consider whether to use backward, forward, or full compatibility modes.
Technology
This article focuses on Google Protocol Buffers, AKA Protobuf, the underlying default protocol in gRPC services and a solid choice for messaging-based communication. The gRPC communication style is It has its specific approach to allowing schema changes, required fields, and default values. When developing backends using Scala, you can leverage ScalaPB to generate case classes from the .proto contract during project compilation. You can also make these generated types even more robust with extra annotations, which we’ll explore below.
Communication style
gRPC
This way of synchronous communication between services is becoming increasingly popular. Typical REST contracts are cumbersome to evolve, not to mention better performance and streaming support. Another issue is modeling service use cases as HTTP verbs like POST/PUT/DELETE/etc., plus resource URLs, which often don’t map well to actual use cases. gRPC addresses these difficulties. With gRPC, we often talk about clients and servers. Usually, the server service maintains its contract as protobuf files, which can be used to generate case classes using ScalaPB. Then fs2grpc can be used to create a thin client layer based on Cats Effect, and the CI can put all this code in a versioned .jar artifact published by the server. Client services can then use these generated artifacts as their dependencies.
Messaging
Asynchronous message-based communication opens many advantages like decoupling, delivery guarantees, event sourcing, command sourcing, etc. It can be implemented with Kafka, Pulsar, or other infrastructure solutions. When realized with Kafka, this pattern is often enriched with an external Schema Registry, which facilitates schema compatibility management between producers and consumers. Let’s focus on the following two integration patterns:
- 1 Producer - multiple consumers
When a service publishes its events and consumers can come and go to read and process these events. Usually, it’s the producer who drives schema changes. - 1 Consumer - multiple producers
When a service reads commands from a topic, where potential client services can publish. Therefore, the consumer updates the schema in this case, which makes it similar to the gRPC server. - Communication between actors of the same type
This case is unique to models where actors of the same type send messages to each other, for example, within Akka Cluster. We won’t focus on such a case in depth. Let me just mention that this situation needs full compatibility to ensure that actors can exchange messages in both new and old versions during a rolling update.
Compatibility modes cheat sheet
Let’s now break down mentioned Protobuf usages into lists of allowed operations like adding/removing fields, changing optionality, etc. First, a quick recap of compatibility modes:
- backward - the most well-known mode. It means that data consumers should be able to consume older versions of data objects.
- forward - used when the consumer has an older schema version but needs to understand data serialized with a newer schema.
- full - the most restrictive constraint, requiring both backward and forward compatibility.
gRPC
It’s recommended to keep backward compatibility for requests. The server updates its schema first, without worrying about requests from clients who use older schemas. Then clients update their schemas gradually. For responses, the desired mode is forward, meaning clients can still process new messages sent from the server. Please note that the no_box wrapper is described in details later in the Optionality section.
mode | communication | scenario | method |
BACKWARD | gRPC requests | add optional field to request | add to proto |
BACKWARD | gRPC requests | change required request field to optional | remove no_box (ScalaPB specific) |
BACKWARD | gRPC requests | change optional request field to required | breaking |
BACKWARD | gRPC requests | remove optional field from request | remove from proto |
BACKWARD | gRPC requests | remove required field from request | remove from proto |
BACKWARD | gRPC requests | add required field to request | breaking |
BACKWARD | gRPC requests | add a field to a oneof in a request | add to proto |
BACKWARD | gRPC requests | move a field into a new oneof in a request | move in proto |
BACKWARD | gRPC requests | remove a field from a oneof in a request | breaking** |
FORWARD | gRPC responses | add optional field to response | add to proto |
FORWARD | gRPC responses | add required field to response | add to proto |
FORWARD | gRPC responses | remove optional field from response | remove from proto |
FORWARD | gRPC responses | change required response field to optional | breaking |
FORWARD | gRPC responses | remove required field from response | breaking* |
FORWARD | gRPC responses | add a field to a oneof in a response | breaking** |
FORWARD | gRPC responses | move a field into a new oneof in a response | optional field only, move in proto |
FORWARD | gRPC responses | remove a field from a oneof in a response | remove from proto |
Keep in mind that removing fields in Protobuf is OK as long as you don’t reuse the index number. You can use the reserved
keyword to emphasize that an index shouldn’t be reused.
* possible with some restrictions, see the Optionality section below.
** A special note on oneofs: Removing a oneof variant in requests makes the server decode that value into UNKNOWN
, which is theoretically indistinguishable from an unset value.
That's why I’m marking it as backward incompatible. Likewise, adding a field to a oneof is considered a forward incompatible change. If you’re sure all your clients are using ScalaPB, and your’re handling UNKNOWN values correctly, such changes can be performed, but with extreme caution. See this blog post for a more thorough explanation and thoughts on compatibility for more non-typical changes to oneof fields.
Messaging: events
In this communication style we have a publisher who manages the contract, and consumers who read the published events. Such style is often implemented with a .jar artifact containing the .proto file together with generated ScalaPB classes and fs2grpc services, maintained and published by the server. The publisher is then typically the first to update its code to adjust to the new contract, while consumers follow. However, let’s consider all the sub-cases:
- Forward compatibility is enough when the producer wants to update events without breaking consumers, who still would be able to use the old schema before they update.
- Alternatively, you might want to update consumers first. If that’s your preferred ordering, it’s backward compatibility you’ll need to keep. Such an approach is mentioned in the Avro and Schema Registry documentation, check it out for a deeper dive.
- Full compatibility is the most restrictive variant. It is recommended when we want to make sure that even after adjusting consumers to the latest schema, they still will be able to parse older events and have special handling for fields with present or missing values. Choose this approach when you expect historical events to be replayed, allowing correct handling of all older versions by the consumers, for example, in event sourcing.
mode | communication | scenario | method |
FORWARD/FULL | messaging: events | add optional field | add to proto |
FORWARD | messaging: events | add required field | add to proto |
FULL | messaging: events | add required field | breaking |
FORWARD/FULL | messaging: events | remove optional field | remove from proto |
FORWARD ONLY | messaging: events | change optional field to required | add no_box |
FULL | messaging: events | change optional field to required | breaking |
FORWARD/FULL | messaging: events | change required field to optional | breaking* |
FORWARD/FULL | messaging: events | remove required field | breaking |
FORWARD/FULL | messaging: events | add a field to a oneof | breaking |
FORWARD/FULL | messaging: events | move a field into a new oneof | move in proto (optional only) |
FORWARD ONLY | messaging: events | remove a field from a oneof | remove from proto |
FULL | messaging: events | remove a field from a oneof | breaking |
Messaging: commands
This mode is very similar to handling GRPC requests. A command handler maintains the contract, so it needs to be backward compatible.
mode | communication | scenario | method |
BACKWARD | messaging: commands | add optional field | add to proto |
BACKWARD | messaging: commands | add required field | breaking |
BACKWARD | messaging: commands | remove required field | remove from proto |
BACKWARD | messaging: commands | remove optional field | remove from proto |
BACKWARD | messaging: commands | change required field to optional | remove no_box |
BACKWARD | messaging: commands | change optional field to required | breaking |
BACKWARD | messaging: commands | add a field to a oneof | add to proto |
BACKWARD | messaging: commands | remove a field from a oneof | breaking |
BACKWARD | messaging: commands | move a field into a new oneof | move in proto |
Optionality and no_box
Since version 3, the Protobuf protocol doesn’t allow required fields, making the optional default. The explanation is that allowing adding/removing required fields was causing too many problems with unexpected broken wire compatibility (see the original GitHub comment).
However, as the tables show, required fields make sense for some scenarios. For example, for BACKWARD compatibility, it’s perfectly fine to have required fields in the initial schema and delete them later. This applies to gRPC requests or asynchronous commands. To make ScalaPB skip wrapping non-primitive types with Option, add a no_box
annotation, like:
google.protobuf.Timestamp createdAt = 1 [(scalapb.field).no_box = true];
Removing required fields breaks FORWARD compatibility, but in special circumstances, we can perform this operation in steps:
- Change required field to optional by removing the
no_box
wrapper - Update all consumers to handle the None case
- Start producing the
None
case
Caution! Such an operation is risky. It doesn’t fully follow compatibility rules from our cheat sheet, so I’m marking it as “breaking” anyway.
Default values
Primitive protobuf types like string
, double
, and others won’t be wrapped with Option
by ScalaPB. Instead, default values will be set, like the empty string “”. It can be very dangerous because, in most cases, these default values are actually illegal, so if you omit a field when constructing a case class, you are in trouble. To strengthen type safety, consider two approaches:
- Add
no_default_values_in_constructor
to your .proto file, which will simply disable default values. Disadvantage: this makes the field strictly required without an easy way to evolve it into an optional field later.option (scalapb.options) = { no_default_values_in_constructor: true };
- Use wrapper types
Such types may increase code noise by requiring additional.value
calls to get to the actual value, but combined withno_box
, this approach gives you more control over the evolution of optionality.
Bonus: buf
Maintaining protobuf files, distributing the contract, enforcing compatibility rules, and managing a consistent set of API design rules requires serious investments. The buf project is an interesting toolset that aims to automate these processes. It ships with a newly developed high-performance Protobuf compiler, a linter that enforces good API design choices and structure, a breaking change detector that enforces compatibility at the source code or wire level, and a generator that invokes your protoc
plugins based on a configurable template. It also contains The Buf Schema Registry (BSR) - a hosted SaaS platform that serves as your organization’s source of truth for your Protobuf APIs. Consider adding elements of buf to your project pipeline to squeeze more from the Protobuf experience.
Conclusions
Deciding on compatibility requirements for your schemas depends on various factors. Leveraging ScalaPB extensions like required fields or disabled default values can increase type safety on the application side without compromising Protobuf wire compatibility, as long as evolution rules are well adjusted to the use case. I hope that this guide will help you fine-tune these rules for your specific application, or even entirely automate their enforcement.
Check: Data serialization tools comparison: Avro vs Protobuf
Reviewed by: Michał Ostruszka, Michał Matłoka, Adam Rybicki, Andrzej Bil, Adrian Wydra, Bartek Henkiel