Contents

Contents

Mastering Rust Patterns vol. 1: Rust Newtypes

Rust newtypes article cover

Newtypes are probably the most basic pattern we use in Rust programming language. It is so basic that it's hardly a pattern, however it has some specific use cases where it is very useful. On the other hand, it can also be overused, leading to a bit over-verbose code. Let's explore the pattern and consider what its best applications are.

What are newtypes?

Newtypes are basically single-field structures wrapping some underlying type:

struct Handler(usize);

Here, we created a newtype Handler that holds usize under the hood. That would be a pretty common use case I would use when I have some underlying vector and I don't want to index it with usize - I will create a Handler wrapper instead.

It is important to notice the difference between newtypes and type aliases. The following is not a newtype:

type OtherHandler = usize;

The OtherHandler is just a type alias for usize. The huge difference is that with newtypes we actually create a new type, while the type keyword only creates an alias for a type. Consequence is, that in this example we can use OtherHandler and usize exchangeably, but Handler is completely separate type:

fn takes_handler(handler: Handler) {}
fn takes_usize(handler: usize) {}

fn main() {
    let raw = 0usize;
    let handler = Handler(0);
    // Note that I just assign an `usize` - its the exact same type
    let other_handler = 0usize;
    // Following would not compile, `Handler` is *not* an `usize`
    // let handler2 = 0usize;

    takes_usize(raw);
    // `takes_usize` accepts `OtherHandler`
    takes_usize(other_handler);
    // But doesn't accept `Handler`
    // takes_usize(handler);

    // `takes_handler` accepts `Handler` only
    takes_handler(handler);
    // Passing `usize` or `OtherHandler` fails
    // takes_handler(raw);
    // takes_handler(other_handler);
}

Now understanding a formal difference between type aliases and newtype, let’s consider another aspect of API - the inner field hermetization. Should the only newtype field be public or not? Technically, both cases are valid, but, personally, I prefer to call the newtype types where the type is private. The reasoning here is related to the purpose of newtypes. We use this pattern when technically need information held by an underlying type, but the semantics we use it for are specific to the API and we don't want to allow passing the underlying type directly.

Following this logic – if we make a newtype field public, if the API user has the underlying type, they can pass it directly to the function, which in a way defeats the purpose of newtypes. Here is an example:

struct PubHandler(pub usize);

fn takes_pub_handler(handler: PubHandler) {}

fn main() {
    let raw = 0usize;
    let handler = PubHandler(0);

    takes_pub_handler(handler);
    // Passing `usize` fails
    // takes_pub_handler(raw);
    // But there is nothing stopping us to just create it in-place
    take_pub_handler(PubHandler(raw));
}

This kind of newtype adds a bit of explicit noise which might be what we want, but I think it has mostly different use cases. Because of that, I personally like to call the types with public fields wrapper types while the types with private fields I call newtypes. It also fits what a common agreement following Rust by Example is. Keep in mind that it’s more of an idiom guideline thing – you might find what I call "wrapper types" be called "newtypes" in the Rust ecosystem.

Why use newtypes?

As I mentioned previously, the most basic usage of newtypes is to pass around information represented by the underlying type, but with a different semantic meaning. Because the semantics are different, we typically don't want any value of the underlying type to be passed – we want some control over it.

Two most common use cases are:

  1. When we want to accept only values that we previously created.
  2. When we want to add validation to the value.

Very often we want both – values returned from our API would be immediately correct, and we might still provide functionality to create the newtype from the underlying type by additional validation.

Use case - newtype as resource handlers

The most basic example of a newtype I’ve used a couple of times is the Handler type I presented before. Let's imagine we are working on a library that, under the hood, keeps some graph – maybe it is a computation graph. One of the most common graph representations of the graph would be to keep a container with nodes, and add some ways of representing connections between them – possibly in separate containers.

In such a representation, the GraphNode might be a somewhat expensive structure we might not want to pass around. Maybe we could work with Rc<GraphNode> instead, but often the simple way would be to have a vector of nodes, and then instead of passing around the whole node, we would pass around its index:

struct MyGraph {
    nodes: Vec<GraphNode>,
}

impl MyGraph {
    fn create_node(&mut self) -> usize {
        nodes.push(GraphNode::default());
        nodes.len() - 1
    }
}

Then whenever we want to do something about the nodes, we will pass its indices:

impl MyGraph {
    fn add_connection(&mut self, from: usize, to: usize) {
        self.nodes[from].add_connection(to)
    }
}

Now that works perfectly, but the inconvenience is, that user can pass arbitrary numbers to the API which might cause problems – here, it would likely be a panic:

fn main() {
    let mut graph = MyGraph::default();
    let node1 = graph.create_node();
    let node2 = graph.create_node();

    // All is fine
    graph.add_connection(node1, node2);

    // Here we get a panic
    graph.add_connection(100, 200);
}

It might not be devastating – Rust here would make sure to not allow access to invalid indices, and there is always a space in documentation where we can mention to never pass indices that were not previously returned from the create_node method.

However, we can do better – we can make it impossible to create invalid indices:

struct NodeHandler(usize);

impl MyGraph {
    fn create_node(&mut self) -> NodeHandler {
        let index = self.nodes.len();
        self.nodes.push(GraphNode::default());
        NodeHandler(index)
    }

    fn add_connection(&mut self, from: NodeHandler, to: NodeHandler) {
        self.nodes[from.0].add_connection(to.0)
    }
}

This code is very close to what we had before, but it is also preventing some issues. But there is a question – why would someone even want to pass the value we never returned? Well, maybe he assumed that the values are safe to be stored in a database and then restored and used while it is not a case. It is not likely anyone would pass 100 directly as a node index, but invalidating the value retrieved of someone else is a pretty likely mistake to happen.

Note, to be very careful of additional assumptions you make about received values. In this particular example the add_connection method can assume that the from and to parameters are nodes that were created by MyGraph, but it is not safe to assume it is the same instance of MyGraph. Let's take a quick look at the following example:

impl MyGraph {
    fn add_connection(&mut self, from: NodeHandler, to: NodeHandler) {
        unsafe {
            self.nodes.get_unchecked(from.0).add_connection(to.0)
        }
    }
}

It might be tempting to assume that, as we are the only producer of NodeHandler, we can fully trust it and perform the crucial optimization, omitting the bound check. But we can very easily end up in an unsound situation:

fn main() {
    let graph1 = MyGraph::default();
    let graph2 = MyGraph::default();
    let node1 = graph1.create_node();
    let node2 = graph2.create_node();
    let node3 = graph3.create_node();
    // `node3` index here is `1`, and the `self.nodes.get_unchecked(1)` becomes an unsound call
    graph1.add_connection(node3, node2);
}

If this optimization was crucial, it’d be possible to implement it, but it would involve some more instrumentation involving lifetimes.

Newtypes used in such a scenario would very often be helping to provide a dev-friendly API, but rarely would they be useful for performance optimizations.

Types validation

The second usage of newtypes is something we discussed recently in our company Book Club. I was sort of sceptical initially, however, it is because I think it was mentioned from the side that is not very typical in Rust. On the other hand, it is a use case that is very often an application of newtypes in the Rust programming language.

In the previous example, we used newtypes to ensure that the handler of the graph node was created by our crate, and it is not a random usize value pushed to our system. Another situation where we might want to use newtypes is similar - we want to limit what values are passed to the function, but it is not necessary that it is a value we previously created – maybe we just want to ensure, that the value was processed in some particular way – often that processing would be validation.

Here is a real-life example. Let's work on a chess tool that involves filtering games. We would have the filter object like so:

struct Filter {
    pub player: Option<String>,
    pub color: Option<Color>,
}

impl ChessService {
    fn fetch_games(&self filter: Filter) -> Vec<Game> {
        todo!()
    }
}

The Filter struct is describing how we filter games. In simplified games, we want to filter only for the provided player. Also, if the color is set, we see only games played by this player with a particular color.

Here we can easily determine the basic assumption we would like to make about the Filter struct – the color field should likely never be set if a player is None. One way to approach this is to validate the filter inside of fetch_games method:

impl ChessService {
    fn fetch_games(&self, filter: Filter) -> Vec<Game> {
        if filter.player.is_none() && filter.color.is_some() {
            panic!("Invalid filter: color cannot be set without player");
        }
        todo!()
    }
}

It works, but we can easily see a slight issue – if the same filter is passed multiple times to fetch_games or to other methods, it has to be validated each time. It is not a huge performance impact, but it is work that can be easily avoided. We can solve it with a newtype pattern:

struct ValidatedFilter(Filter);

impl ValidatedFilter {
    fn new(filter: Filter) -> Self {
        if filter.player.is_none() && filter.color.is_some() {
            panic!("Invalid filter: color cannot be set without player");
        }
        Self(filter)
    }
}

impl ChessService {
    fn fetch_games(&self, filter: ValidatedFilter) -> Vec<Game> {
        todo!()
    }
}

We can now validate only once when the filter is created, and the rest of the system can assume that it is valid. In contrast to the previous case we discussed, here we can even save on some performance, as the validation is not dependent on any other entities.

Obviously, we don’t really need a newtype for this – the ValidatedFilter struct could as well have all the fields of the Filter itself, however, that would involve some code repetition.

This kind of newtype application can be recognized as a basic application of the typestate pattern. Typestate pattern is common in Rust, and its idea is to use typesystem to represent the state of an object. In this scenario, the Filter type is a potential filter type that might be in process of construction, while the ValidatedFilter type is a filter that is in validated state. This is also very often used with builders – we would use Filter as a builder, possibly providing some methods to set the fields, and the final Filter::build(self) method would perform a validation and return a final ValidatedFilter.

Yet another example of a scenario that would use a newtype in validation context is to keep the Email type in some system where we want to make sure emails are valid, maybe also filter the allowed domains, or deny-lists. Even a basic email validation is somewhat complex, and in some systems it might become arbitrarily complex.

As much as it is difficult to imagine how the system relying on email format would be impacted on the performance of the email validation, I can easily imagine a system that would really rely its stability on email format. And one thing I learned is: if something is not verified by a compiler, I will eventually forget to do it. Or someone in my team will. Here is how the code will look like:

struct Email(String);

impl Email {
  fn validate(str: String) -> Result<Self, ValidationError> {
    // Validation code...

    Self(str)
  }
}

Newtype pattern might help us to make sure we never forget to validate the email - all the code that is relying on the actual email format is supposed to always use the newtype. If the code is using email as String or &str – it’s fine, as long as this part of code doesn’t rely on that.

Also note that in this case it may be pretty much useful to add the conversion to string functions - maybe From conversion (but only Email to String, not the other way!), maybe AsRef or even Deref trait to expose the internal string. Converting back Email to String is not a problem - even if it will be modified or processed before the critical part of the system gets the proper email it will need to be re-validated.

And there are even more things we can do to make using this newtype even easier. We can take advantage of the TryFrom trait to implement conversion from String to Email - but it is not a direct unchecked conversion. This is a fallible conversion, so we have an opportunity to perform validation:

impl TryFrom<String> for Email {
  type Error = ValidationError;

  fn try_from(value: String) -> Result<Self, Self::Error> {
    Self::validate(value)
  }
}

We can also go for even more helpers and allow to deserialize directly to Email - just keep in mind to always perform validation when doing so. If we use serde for deserialization, we should provide a custom deserialization implementation, or use try_from serde attribute.

Hiding underlying type

There is one more reason to use the newtype pattern that is in particular useful when we are creating libraries – to hide the underlying type entirely. So far, we were mostly concerned about what is the underlying data value and we used newtype mostly as a guard to not provide something invalid to our system.

However, sometimes we might want to avoid exposing the underlying type, to get some flexibility in terms of API changes.

Let's come back to the graph example. We discussed that our graph is represented by some GraphNode values stored in the hidden vector. However, it is possible that at some point we might want to change the underlying implementation. Maybe there will not be a vector in the future, but instead we will move to carrying Rc<NodeData>. Or maybe vector will not be good enough for us at a particular stage, and we might want to use HashMap<Uuid, GraphNode> or maybe some database and the usize would not be a proper index type.

Using the newtype pattern for the library API, we can give the user the type that represents its semantics, instead of its implementation details. That approach makes it easier for us to change the entire implementation without bumping up the major version, and we can still be semver-compatible.

Is a newtype always a goto?

We just discussed two major cases when we can benefit from using the newtype pattern to provide a better API. The question is – should we push towards using newtypes wherever it seems to make sense?

I’m personally against such an approach. As much as newtypes can help provide a better API, they are also very good at increasing boilerplate.

Here is an example. Imagine we are creating some graphics library, and we want to provide a Color type that looks like this:

struct Color(u8, u8, u8);

The three elements of the Color would be RGB values, all 8-bit. However, we are very excited about newtypes so we are improving our API:

struct Channel(u8);
struct Color(Channel, Channel, Channel);

impl Color {
    fn new(r: Channel, g: Channel, b: Channel) -> Self {
        Self(r, g, b)
    }
}

The problem is that for this particular usage, we almost certainly want to make it possible to create a Channel type out of u8, and the other way around:

impl From<u8> for Channel {
    fn from(value: u8) -> Self {
        Self(value)
    }
}

impl From<Channel> for u8 {
    fn from(value: Channel) -> Self {
        value.0
    }
}

Now the user of our crate can in fact create a blue color:

fn main() {
    let blue = Color(Channel::from(0), Channel::from(0), Channel::from(255));
}

Thanks to the Into trait, we can do it even nicer:

fn main() {
    let blue = Color::new(0.into(), 0.into(), 255.into());
}

However, I really don't see an advantage of this syntax over a simple Color::new(0, 0, 255). We introduced an additional boilerplate requirement, without adding any significant benefits.

My rule of thumb on when to use newtypes is if you will not provide the constructor to create your newtype directly from the underlying one – either through method or the From trait. If that is the case, that means there is some limitation on the value shape that is verified when the type is returned. However, note that if the constructor performs validation, it is not what I mean by direct creation. Therefore, if the constructor is fn ValidatedFilter::new(Filter) -> Result<Filter> it is pretty much a use case for the newtype pattern.

Common pitfall – hidden conversion

One theme I try to keep throughout an article is that newtype should make sure that its creation is controlled. There is one common pitfall I noticed a couple times when newtypes were used carelessly – the hidden conversion.

As I mentioned in the color example, sometimes using newtype leads to a lot of from / into calls, and to me that is a sign of the pattern abuse.

However, as much as From trait is a very obvious conversion, when implementing newtypes we should take extra care to ensure that all creation paths, and conversions in particular are using the proper construction, guaranteeing the type consistency. The typical slip is working with serialization. I did mention it before in the Email example, but I want to reemphasize it – make sure that you don’t blindly #[derive(Deserialize)] on your newtypes. It might be a serde::Deserialize, but it would be every single deserialization point you might face - your protocol implementation, your binary mmaping allocator.

The very concrete counterexample of newtype application is coming from the project I used to maintain - CosmWasm. There is one particular, very commonly used type Addr. Addr is a newtype over string and it represents a valid Cosmos address. It is well documented, created through validation, and it has one flaw – it is directly deserializable. There are very concrete reasons for that, and I don’t want to enter a discussion on the question of whether that is a valid place for an exception. Reasons are mostly related to the fact that the type is supposed to be often serialized and deserialized in storage, and due to CosmWasm nature, performance was important, so skipping validation when unnecessary was too.

However, there was a particular case when the Deserialize was a true problem – the Addr type could be directly used as the field in the incoming smart contract message. And that message had this one trait – it was coming from an untrusted user. This simple slip was a way to accidentally skip the address validation which could end up in various failures. I am not aware of any critical bugs related to this, but I did face a couple of situations when we wasted time on testing because of this – for example configuration mistakes that could be verified early instead of causing deep errors.

The problem was solved by guidelining – experienced CosmWasm devs know to never put Addr in message types for that exact reason. In the past, we did similar things with numbers – because of Wasm implementation limitations we had to avoid i128, for example.

This is just one real-life example, but it gave me a good lesson about handling newtypes.

Final word

The most important thing about a newtype pattern is to control the type creation for any reason. Use it whenever it provides useful type-checked preconditions, and use it wisely! Always remember to check all creation paths and enjoy a clean, Rusty codebase.

Reviewed by Krzysztof Grajek.

Blog Comments powered by Disqus.