Contents

As a software developer, you have probably encountered a situation when you wanted to execute a function at least a couple of times until it succeeded, fetching some result from external sources (usually external service). This is a common scenario when we need to integrate our application with some external data source and the external service is operating poorly, returns random errors or sometimes is just not available for a couple of seconds.

Writing CRUD based applications and integrating with external services is a daily routine for most of the software developers I know, nevertheless, finding better and faster ways of doing those tedious tasks is an ongoing process and it became much easier over the years with new tools, libraries, and frameworks popping out everywhere, every day.

Retrying failed calls

Going back to our common problem of retrying failed calls, you have probably tried to solve it with an easy catch-and-retry solution like this:

def retry[T](n: Int)(fn: => T): T = {
  try {
    fn
  } catch {
    case e =>
      if (n > 1) retry(n - 1)(fn)
      else throw e
  }
}

This is the simplest retry implementation you possibly could come up with. We execute our function in a try-catch block and recursively retry when several retries are bigger than 0. All is fine and dandy.

You can modify the first version slightly and help the compiler to optimize the recursive calls and make the tail-recursive:

 @annotation.tailrec
 def retry[T](n: Int)(fn: => T): T = {
  Try { fn } match {
   case Success(x) => x
   case _ if n > 1 => retry(n - 1)(fn)
   case Failure(e) => throw e
  }
 }

Now, when we call our retry function, any exceptions happening within Try will be handled by the match block and either the function will be executed again or we just pass the exception further once we run out of the attempts.

All this is of course pretty trivial and you can quickly encounter a situation where you want to have a bit more control on when exactly you want that next function call to occur, eg: you want some exponential delay? Or maybe you want to retry only on specific errors? What about handling the function with results wrapped inside an effect? etc. You get the idea.

As you have probably guessed by now, there is a small but pretty neat library for doing just that, it works with Cats IO, which I like, it is pretty configurable and easy to use.

Meet the cats-retry!

The project is open-source and available on GitHub. I will try to introduce you to the basic features cats-retry provides, together with some pointers to the cool stuff you can do with it on your next or existing project you are working on right now.

The purpose of cats-retry is to wrap any arbitrary monadic actions (eg. wrapped in cats.effect.IO ) and on execution, run the action and potentially retry it with some configurable delay and a configurable number of times. At the end of the day, it will be easier for you to work with eg. network IO actions which often experience temporary problems and invisibly retry them in a configurable way.

There are 2 basic ways of handling the retries with cats-retry. The first one works by checking the result of action execution, and if that is not what we expected to receive, we can retry. The latter approach is based on MonadError with wrapped error type (usually a descendant of Throwable).

The easiest way to understand how cats-retry works is to play around with the so-called Combinators it offers.

To easily visualize how all this works, let’s prepare some utility functions we will use along the way when constructing our retry wrappers.

Cats-retry combinators need some information on what should be executed, what should be called in case the error gets detected, and when it should retry things.

val policy: RetryPolicy[IO] = RetryPolicies.constantDelay[IO](1.second)

def onFailure(failedValue: Int, details: RetryDetails): IO[Unit] = {
    IO(println(s"Rolled a $failedValue, retrying ... ${details}"))
  }

def onError(err: Throwable, details: RetryDetails): IO[Unit] = {
    IO(println(s"recovering from ${err.getMessage}"))
  }

The onFailure function is an example of the callback function that will be called by cats-retry when the result obtained by calling our action is not what we expected.

The onError, on the other hand, is an example of the callback function, called when our proper action execution throws an exception wrapped in a MonadError instance.

Both of those functions are usually used for logging purposes only.

The policy defined at the beginning is just a simple example of the policy to retry our action call every 1 second.

Combinators

retryingOnFailures

One of the simplest ones, belonging to the first group I have mentioned earlier, which is based on the value returned by the function we want to execute, is retryingOnFailures.

import cats.effect.{IO, IOApp}
import com.softwaremill.util.LoadedDie
import retry._

import scala.concurrent.duration._

object CatsRetryOnFailures extends IOApp.Simple {

  val loadedDie: LoadedDie = LoadedDie(2, 5, 4, 1, 3, 2, 6)

  def unsafeFunction(): IO[Int] = {
    IO(loadedDie.roll())
  }

  val policy: RetryPolicy[IO] = RetryPolicies.constantDelay[IO](1.second)

  def onFailure(failedValue: Int, details: RetryDetails): IO[Unit] = {
    IO(println(s"Rolled a $failedValue, retrying ... ${details}"))
  }

  def isResultOk(i: Int) = IO {
    if(i == 3) true else false
  }

  val io: IO[Int] = retryingOnFailures(policy, isResultOk, onFailure){
    unsafeFunction()
  }

  override def run: IO[Unit] = {
    io.map(r => println(s"finished with: ${r}"))
  }
}

LoadedDie is a utility class that can be found in cats-retry sources, which simply get us a next value for the list of provided values in the constructor, with each roll() execution.

retryingOnFailures takes 4 arguments:

  • policy defined in our example to constant delay of 1 second between calls,
  • isResultOk function example to check if the value obtained by calling our main function was ok or not, if not, the next call will be executed
  • onFailure callback function executed when an error occurs,
  • unsafeFunction() our main function we want to retry if something bad happens.

Running the above will produce the output like the one below:

Rolled a 2, retrying ... WillDelayAndRetry(1 second,0,0 days)
Rolled a 5, retrying ... WillDelayAndRetry(1 second,1,1 second)
Rolled a 4, retrying ... WillDelayAndRetry(1 second,2,2 seconds)
Rolled a 1, retrying ... WillDelayAndRetry(1 second,3,3 seconds)
finished with: 3

Process finished with exit code 0

If we never return true from the isResultOk function, with the policy we have created (constantDelay), we will never terminate and will retry forever.

You can observe the behaviour of our retry functionality with the RetryDetails instance received in our onFailure callback function. The RetryDetails keeps all the information of how many retries we have already executed, what the delay is, and if it’s going to give up retrying:

sealed trait RetryDetails {
  def retriesSoFar: Int
  def cumulativeDelay: FiniteDuration
  def givingUp: Boolean
  def upcomingDelay: Option[FiniteDuration]
}

Let’s change our policy to RetryPolicies.limitRetries(3) so we will never reach a value of 3 we were looking for, as it’s on the 5th place in our LoadedDice instance. When running the app with such a policy, we get a different output:

Rolled a 2, retrying ... WillDelayAndRetry(0 days,0,0 days)
Rolled a 5, retrying ... WillDelayAndRetry(0 days,1,0 days)
Rolled a 4, retrying ... WillDelayAndRetry(0 days,2,0 days)
Rolled a 1, retrying ... GivingUp(3,0 days)
finished with: 1

Process finished with exit code 0

retryingOnSomeErrors

The other way of doing things when handling retries is to check the error which occurred during the main function execution and act on that. The retryingOnSomeErrors combinator allows us to work with MonadError types and only retry on selected errors.

Similarly to retryingOnFailures, we need to provide some callback functions to help cats-retry on deciding whether it should continue retrying or not. Slight differences are that we will use the type E of MonadError[M, E] to do that, which in our case will be simple Throwable.

object CatsRetryOnSomeErrors extends IOApp.Simple {

  val loadedDie: LoadedDie = LoadedDie(2, 5, 4, 1, 3, 2, 6)

  def unsafeFunction(): IO[Int] = {
    val res = loadedDie.roll()
    if(res != 4) {
      IO.raiseError(new IllegalArgumentException("roll different than 4"))
    } else {
      IO.pure(res)
    }
  }

  val policy: RetryPolicy[IO] = RetryPolicies.constantDelay[IO](1.second)

  def isIOException(e: Throwable): IO[Boolean] = e match {
    case _: IllegalArgumentException => IO.pure(true)
    case _ => IO.pure(false)
  }

  def onError(err: Throwable, details: RetryDetails): IO[Unit] = {
    IO(println(s"recovering from ${err.getMessage}"))
  }

  val io: IO[Int] = retryingOnSomeErrors(isWorthRetrying = isIOException, policy = policy, onError = onError){
    unsafeFunction()
  }

  override def run: IO[Unit] = {
    io.map(r => println(s"finished with: ${r}"))
  }
}

I have modified the unsafeFunction as well to raise errors with IO as our implementation of MonadError.

The retryingOnSomeErrors combinator takes a callback function to check if the error contained within a MonadError is worth retrying. Similarly to the previous example, we provide a callback function to log the errors occurring.

recovering from roll different than 4
recovering from roll different than 4
finished with: 4

Process finished with exit code 0

Number 4 is the 3rd number in our LoadedDie instance, hence we see just 2 logs when we retried the calls to our unsafeFunction.

retryingOnAllErrors

This is a somehow simplified version of the retryingOnSomeErrors combinator, as it doesn’t require us to provide a function deciding whether we should continue retrying or not. It will simply retry all errors contained within our MonadError.

retryingOnFailuresAndSomeErrors, retryingOnFailuresAndAllErrors

Those are the combinations of the aforementioned combinators. You can watch for both: the output from our main function as well as errors the function returns, and retry whenever necessary.

Policies

The interesting aspect of cats-retry is how configurable it is. There are a few policies you can use out of the box to specify the conditions used to decide whether your action should be retried further or not but most importantly, you can combine them to create rather complex solutions or even create your completely custom policy.

The built-in policies are pretty self-explanatory, we have constantDelay and limitRetries that we have already used, as well as policies with changing delay times between calls like exponentialBackoff, fibonacciBackoff, or fullJitter.

You can modify defined policies further by transforming them with additional policy transformers, eg: capDelay — for setting the upper bound on the delay between retries.

Last but not least, you can join the policies. There are a number of different ways you can join defined policies with different logic on how to deal with different delays defined in the policies joined etc. For more information about this, I would strongly recommend visiting the cats-retrydocumentation.

Most importantly, you can define your own custom policies that can be used to retry your monadic functions by not only delays or times it was already executed but by any logic you can come up with, the sky is the limit.

Policies in cats-retry are like shrimps in the famous movie Forrest Gump.

_You can barbecue it, boil it, broil it… There are shrimp gumbo, pan fried, deep fried…

When I see a library like cats-retry with so many configuration options to choose from or even providing the ability to build your own, it always makes me smile and definitely makes my work easier. I hope it will make yours too.

Blog Comments powered by Disqus.