Contents

More than just HelloWorld.json in sttp & uPickle

More than just HelloWorld.json in sttp & uPickle webp image

Introduction

Json serialization in Scala is the second most common joke in the Scala community after Monad’s definition. It’s just something that everyone has already done or will eventually do. Unfortunately, Scala’s standard library doesn’t provide a default mechanism to work with Json, and on top of that, there is no “one and only” library on the market to choose from. To make it even funnier, Json serialization in the Scala community is equivalent to the TODO list in other languages.

Recently I had a chance to work on an open source project in which we’re wrapping the OpenAI API with Scala’s client, which allows users to use the power of ChatGPT and other OpenAI features directly in their Scala code. Naturally, the very first thing we had to agree on was, “Which Json library are we going to use?”. We’ve already used a couple of libraries like circe, jsoniter, or json4s, but we’ve decided to use a library that we’ve never used before - uPickle.

Two main reasons for that are:

  1. It’s always good to learn something new and check if this is the chosen one library.
  2. It’s used in Scala Toolkit, and since we’re working on an open source library, our approach is to use tools and solutions that are striving to be a standard.

This article shows uPickle solutions to some non-standard serialization/deserialization scenarios:

  • ADTs (sealed trait and classes hierarchy) with uncommon differences between possible cases.
  • The transformation from Json’s snake_case fields to camelCase fields during deserialization and vice versa for serialization and their integration with sttp.

A Request and a Response walk into a bar

Configuration | Necessary dependencies

In all the examples, we’re using sttp4 and uPickle.

 "com.softwaremill.sttp.client4" %% "core" % "4.0.0-M1"
 "com.softwaremill.sttp.client4" %% "upickle" % "4.0.0-M1"
 "com.lihaoyi" %% "upickle" % 3.1.0

We’re working with the OpenAI API. To be more precise, let’s say we’re working on the implementation of https://platform.openai.com/docs/api-reference/completions/create as an example.

First things first

We need to create model Request and Response provided by OpenAI documentation.

To match the AI specification, we can define the following Scala class for the request body:

case class CompletionsBody(
   model: String,
   prompt: Option[???] = None,
   suffix: Option[String] = None,
   max_tokens: Option[Int] = None,
   temperature: Option[Double] = None,
   topP: Option[Double] = None,
   n: Option[Int] = None,
   stream: Option[Boolean] = None,
   logprobs: Option[Int] = None,
   echo: Option[Boolean] = None,
   stop: Option[???] = None,
   presence_penalty: Option[Double] = None,
   frequency_penalty: Option[Double] = None,
   best_of: Option[Int] = None,
   logit_bias: Option[Map[String, Float]] = None,
   user: Option[String] = None
)

The only required parameter is model, so we can make the rest of the parameters optional with Scala’s Option and default them to None, so any time a user wants to create an instance of our class, the user can provide only the parameters that the user cares about.

First problem “???”

The tricky thing here is how we should model our case class parameters of “prompt” and “stop”. OpenAI’s documentation states that users can provide two valid options: a String or an Array.

We want our request body to look like the below:

Single

{
 "model": "text-davinci-003",
 ...
 "prompt": "single prompt",
 "stop": "single stop"
}

Multiple

{
 "model": "text-davinci-003",
 ...
 "prompt": ["multiple", "prompt"],
 "stop": ["multiple", "stop"]
}

I think the most standard approach would be to make an ADT.

So first, let’s define a sealed trait Stop:

sealed trait Stop

And then two case classes that will extend Stop and store value as a String or sequence of Strings:

case class SingleStop(value: String) extends Stop
case class MultipleStop(values: Seq[String]) extends Stop

The only thing left to do is to provide a uPickle Writer to serialize our classes to Json, and a Reader to deserialize Json to case classes.

Using uPickle with ADTs

To serialize and deserialize class to Json in uPickle, we can use upickle.default.macroRW, which create both a read method (used to deserialize JSON into a case class) and a write method (used to serialize a case class into JSON).

import upickle.default._

object CompletionsRequestBody {

 case class CompletionsBody(
     model: String,
     prompt: Option[Prompt] = None,
     stop: Option[Stop] = None,
     ...
 )

 object CompletionsBody {
   implicit val completionBodyRW: ReadWriter[CompletionsBody] = macroRW
 }

 sealed trait Prompt
 object Prompt {
   implicit val promptRW: ReadWriter[Prompt] =
     ReadWriter.merge[Prompt](SinglePrompt.singlePromptRW, MultiplePrompt.multiplePromptRW)

   case class SinglePrompt(value: String) extends Prompt

   object SinglePrompt {
     implicit val singlePromptRW: ReadWriter[SinglePrompt] = macroRW[SinglePrompt]
   }

   case class MultiplePrompt(values: Seq[String]) extends Prompt

   object MultiplePrompt {
     implicit val multiplePromptRW: ReadWriter[MultiplePrompt] = macroRW[MultiplePrompt]
   }
 }
}

sealed trait Stop

object Stop {
 implicit val stopRW: upickle.default.ReadWriter[Stop] =
   ReadWriter.merge(SingleStop.singleStopRW, MultipleStop.multipleStopRW)

 case class SingleStop(value: String) extends Stop

 object SingleStop {
   implicit val singleStopRW: ReadWriter[SingleStop] = macroRW[SingleStop]
 }

 case class MultipleStop(values: Seq[String]) extends Stop

 object MultipleStop {
   implicit val multipleStopRW: ReadWriter[MultipleStop] = macroRW[MultipleStop]
 }
}

To serialize our hierarchy, we can use the merge method in our parent object, which will combine ReadWriters of its children’s methods.

Now let’s build a case class for a response from the API:

import upickle.default._

object CompletionsResponseData {
 case class Choices(
     text: String,
     index: Int,
     logprobs: Option[String],
     finish_reason: String
 )
 object Choices {
   implicit val choicesRW: ReadWriter[Choices] = macroRW[Choices]
 }

 case class Usage(prompt_tokens: Int, completion_tokens: Int, total_tokens: Int)

 object Usage {
  implicit val choicesRW: ReadWriter[Usage] = macroRW
 }

 case class CompletionsResponse(
     id: String,
     `object`: String,
     created: Int,
     model: String,
     choices: Seq[Choices],
     usage: Usage
 )
 object CompletionsResponse {
   implicit val completionsResponseRW: ReadWriter[CompletionsResponse] = macroRW[CompletionsResponse]
 }
}

And let’s finish it with the sttp Request definition:

def createCompletion(
   completionsBody: CompletionsBody,
   token: String
): Request[Either[ResponseException[String, Exception], CompletionsResponse]] = {
 import sttp.client4.upicklejson._

 basicRequest.auth
   .bearer(token)
   .post(uri"https://api.openai.com/v1/completions")
   .body(completionsBody)
   .response(asJson[CompletionsResponse])
}

The code compiles without any problem, and since we’re using Scala, we can assume that it 100% works.

But just to be sure, let’s test it.

object Main extends App {

 val backend: SyncBackend = DefaultSyncBackend()

 val body = CompletionsBody(
  model = "text-davinci-003",
  prompt = Some(SinglePrompt("single prompt")),
  stop = Some(SingleStop("single stop"))
 )

 val token = "my-secret-token"

 val response = createCompletion(body, token).send(backend)
}

Since our method evaluates into:

Request[Either[ResponseException[String, Exception], CompletionsResponse]]

we should check what response variable stores inside.

println(response)

That’s not what I was expecting:

Response(
 Left(sttp.client4.HttpError: statusCode: 400, response: {
 "error": {
 "message": "[{'$type': 'sttp.openai.requests.completions.CompletionsRequestBody.Prompt.SinglePrompt', 'value': 'single prompt'}] is valid under each of {'type': 'array', 'minItems': 1, 'items': {'oneOf': [{'type': 'integer'}, {'type': 'object', 'properties': {'buffer': {'type': 'string', 'description': 'A serialized numpy buffer'}, 'shape': {'type': 'array', 'items': {'type': 'integer'}, 'description': 'Array shape'}, 'dtype': {'type': 'string', 'description': 'Stringified dtype'}, 'token': {'type': 'string'}}}]}, 'example': '[1, 1313, 451, {\"buffer\": \"abcdefgh\", \"shape\": [1024], \"dtype\": \"float16\"}]'}, {'type': 'array', 'minItems': 1, 'maxItems': 2048, 'items': {'oneOf': [{'type': 'string'}, {'type': 'object', 'properties': {'buffer': {'type': 'string', 'description': 'A serialized numpy buffer'}, 'shape': {'type': 'array', 'items': {'type': 'integer'}, 'description': 'Array shape'}, 'dtype': {'type': 'string', 'description': 'Stringified dtype'}, 'token': {'type': 'string'}}}], 'default': '', 'example': 'This is a test.', 'nullable': False}} - 'prompt'",
 "type": "invalid_request_error",
 "param": null,
 "code": null
}
}), ...)

What seems to be the officer, problem?

If we get rid of all that noise and just focus on the response returned from API:

 "error":{
   "message":"[{'$type': 'sttp.openai.requests.completions.CompletionsRequestBody.Stop.SingleStop', 'value': 'single stop'}] is not valid under any of the given schemas - 'stop'",
   "type":"invalid_request_error",
   "param":null,
   "code":null
 }
}

As we can see, the value itself is not represented as simple String but a structure with a $type and value subfields:

{'$type': 'sttp.openai.requests.completions.CompletionsRequestBody.Prompt.SinglePrompt', 'value': 'single prompt'}

Indeed, the uPickle documentation shows similar examples and explains that our Prompt and Stop will serialize with the full name of the instance’s class, so we need to go deeper in order to tune the encoder to our contract.

We can solve this problem by creating our own custom picklers. To do so, we’re going to use a readwriter method that returns the implicit value of ReadWriter of type T:

def readwriter[T: ReadWriter] = implicitly[ReadWriter[T]]

It gives us a bimap function that is used to create a pickler that reads/writes a type V, using the pickler for type T by providing a conversion function between them.

def bimap[V](f: V => T, g: T => V): ReadWriter[V]

So let’s get back to the Prompt hierarchy.
We will be using uJson (uPickle’s backend library) to conveniently construct JSON blobs/structs.

implicit val promptRW: ReadWriter[Prompt] = readwriter[ujson.Value].bimap[Prompt]()

The ReadWriter for Prompt:

implicit val promptRW: ReadWriter[Prompt] = readwriter[ujson.Value].bimap[Prompt](
 {
   case SinglePrompt(value)    => writeJs(value)
   case MultiplePrompt(values) => writeJs(values)
 },
  jsonValue => read[ujson.Value](jsonValue) match {
   case Str(value) => SinglePrompt(value)
   case Arr(value) => MultiplePrompt(value.map(_.str).toSeq)
   case e => throw new Exception(s"Could not deserialize: $e")
  }
)

Let’s break down those couple of lines step by step.

For the first argument of bimap, we have to provide a way of serialization of our Prompt into ujson.Value or, in other words, how to write Prompt as Json.

We can do it like this:

prompt => prompt match { 
 case SinglePrompt(value) => writeJs(value)
 case MultiplePrompt(values) => writeJs(values)
}

Or make it even shorter:

readwriter[ujson.Value].bimap[Prompt]({
 case SinglePrompt(value)    => writeJs(value)
 case MultiplePrompt(values) => writeJs(values)
}, ???)

For the second argument, we have to provide a way of deserializing a uJson.Value back into a case class instance. Value into Prompt or, in other words, how to read Json as Prompt.

We can use the read method and pass our ujson.Value into it:

def read[T: Reader](s: ujson.Readable, trace: Boolean = false): T = {
 TraceVisitor.withTrace(trace, reader[T])(s.transform(_))
}

And use pattern matching on the output of that function:

jsonValue => read[ujson.Value](jsonValue) match {
    case Str(value) => ???
    case Obj(value) => ???
    case Arr(value) => ??
    case Num(value) => ???
    case bool: Bool => ???
    case Null => ???
}

That provides us with all supported types by uPickle. In our case, there are only two types that interest us, which are Str for String values and Arr for a sequence of values. So we can wrap it to:

jsonValue => read[ujson.Value](jsonValue) match {
    case Str(value) => SinglePrompt(value)
    case Arr(value) => MultiplePrompt(value.map(_.str).toSeq)
    case e => throw new Exception(s"Unable to deserialize $e")
}

In order to satisfy the compiler, we must provide a match for all the other cases. In our scenario, let’s just throw an exception.

ReadWriter for Stop is basically the same:

implicit val stopRW: upickle.default.ReadWriter[Stop] = readwriter[ujson.Value].bimap[Stop](
   {
     case SingleStop(value) => writeJs(value)
     case MultipleStop(values) => writeJs(values)
   },
   jsonValue => read[ujson.Value](jsonValue) match {
     case Str(value) => SingleStop(value)
     case Arr(value) => MultipleStop(value.map(_.str).toSeq)
     case e => throw new Exception(s"Could not deserialize: $e")
   }
 )

It compiles, so let’s test it again:

object Main extends App {

 val backend: SyncBackend = DefaultSyncBackend()

 val body = CompletionsBody(
  model = "text-davinci-003",
  prompt = Some(SinglePrompt("single prompt")),
  stop = Some(SingleStop("single stop"))
 )

 val token = "my-secret-token"

 val response = createCompletion(body, token).send(backend)
}

Response:

Response(Right(CompletionsResponse(cmpl-798xyvcxGlfyQc8ipnxSNa0EEx40f,text_completion,1682413774,text-davinci-003,List(Choices(What would you like to learn today?,0,null,stop)),Usage(2,10,12))),...)

As we can see, this time, we’ve got it right.

The one configuration to rule them all

When we’ve modeled our case classes, you’ve probably seen that we’ve used snake_case convention for naming variables/parameters. And there is a reason for that.

OpenAI API is using snake_case convention so we have to choices:

  • Use snake_case everywhere
  • Transform our case classes’ data from camelCase to snake_case upon creating request’s body and transform snake_case to camelCase upon creating the case class from the response.

Do it the hard way

Camel case is Scala’s standard convention used for variables, so naturally, we want to use it to build our data models. So let’s do it the proper way.

Firstly, we copy the implementation of such conversion directly from uPickle documentation:

object SnakePickle extends upickle.AttributeTagged { 
 private def camelToSnake(s: String): String =
   s.replaceAll("([A-Z])", "#$1").split('#').map(_.toLowerCase).mkString("_")

 private def snakeToCamel(s: String): String = {
   val res = s.split("_", -1).map(x => s"${x(0).toUpper}${x.drop(1)}").mkString
   s"${s(0).toLower}${res.drop(1)}"
 }

 override def objectAttributeKeyReadMap(s: CharSequence): String =
   snakeToCamel(s.toString)

 override def objectAttributeKeyWriteMap(s: CharSequence): String =
   camelToSnake(s.toString)

 override def objectTypeKeyReadMap(s: CharSequence): String =
   snakeToCamel(s.toString)

 override def objectTypeKeyWriteMap(s: CharSequence): String =
   camelToSnake(s.toString)

 /** This is required in order to parse null values into Scala's Option */
 override implicit def OptionWriter[T: SnakePickle.Writer]: Writer[Option[T]] =
   implicitly[SnakePickle.Writer[T]].comap[Option[T]] {
     case None    => null.asInstanceOf[T]
     case Some(x) => x
   }

 override implicit def OptionReader[T: SnakePickle.Reader]: Reader[Option[T]] =
   new Reader.Delegate[Any, Option[T]](implicitly[SnakePickle.Reader[T]].map(Some(_))) {
     override def visitNull(index: Int) = None
   }
}

Fortunately, we have to do it only once per convention in comparison to making ReadWriters every new type we create.

Now by using SnakePickle.ReadWriter, we should be able to name parameters of our case classes in camelCase convention.

import sttp.openai.json.SnakePickle

case class CompletionsBody(
   model: String,
   maxTokens: Option[Int] = None,
   topP: Option[Double] = None,
   presencePenalty: Option[Double] = None,
   frequencyPenalty: Option[Double] = None,
   bestOf: Option[Int] = None,
   logitBias: Option[Map[String, Float]] = None,
   ...
)

object CompletionsResponseData {
  case class Choices(
     finishReason: String,
     ...
  )

case class Usage(promptTokens: Int, completionTokens: Int, totalTokens: Int)

The Rest stays the same.
Now we have to change our upickle.default.ReadWriters into SnakePickle ones.

import sttp.openai.json.SnakePickle

object CompletionsBody {
 implicit val completionBodyRW: SnakePickle.ReadWriter[CompletionsBody] = SnakePickle.macroRW
}

sealed trait Prompt

object Prompt {
 implicit val promptRW: SnakePickle.ReadWriter[Prompt] = SnakePickle.readwriter[ujson.Value].bimap[Prompt](
   {
     case SinglePrompt(value)    => SnakePickle.writeJs(value)
     case MultiplePrompt(values) => SnakePickle.writeJs(values)
   },
   jsonValue =>
     SnakePickle.read[ujson.Value](jsonValue) match {
       case Str(value) => SinglePrompt(value)
       case Arr(value) => MultiplePrompt(value.map(_.str).toSeq)
       case e          => throw new Exception(s"Could not deserialize: $e")
     }
 )

 case class SinglePrompt(value: String) extends Prompt

 case class MultiplePrompt(values: Seq[String]) extends Prompt
}

sealed trait Stop

object Stop {
 implicit val stopRW: SnakePickle.ReadWriter[Stop] =
   SnakePickle.readwriter[ujson.Value].bimap[Stop](
     {
       case SingleStop(value)    => SnakePickle.writeJs(value)
       case MultipleStop(values) => SnakePickle.writeJs(values)
     },
     jsonValue =>
       SnakePickle.read[ujson.Value](jsonValue) match {
         case Str(value) => SingleStop(value)
         case Arr(value) => MultipleStop(value.map(_.str).toSeq)
         case e          => throw new Exception(s"Could not deserialize: $e")
       }
   )

 case class SingleStop(value: String) extends Stop

 case class MultipleStop(values: Seq[String]) extends Stop
}

If we try to compile it, we will receive an Error from sttp:

No given instance of type sttp.client4.BodySerializer[
 [error]    |  sttp.openai.requests.completions.CompletionsRequestBody.CompletionsBody
 [error]    |] was found for an implicit parameter of method body in trait PartialRequestExtensions
]

This problem occurs because of the limitations of uPickle and SttpUpickleApi Trait provided by sttp.

In order for that to work, we have to provide an extension for SttpUpickleApi Trait, in which we’re gonna implement our own method for serialization and deserialization of SnakePickle. It’s necessary, but fortunately, also we have to do it only once per convection.

import sttp.client4.json.RichResponseAs
import sttp.client4.upicklejson.SttpUpickleApi
import sttp.client4.{asString, BodySerializer, IsOption, JsonInput, ResponseAs, ResponseException, StringBody}
import sttp.model.MediaType

object SttpUpickleApiExtension extends SttpUpickleApi {
 implicit def upickleBodySerializerSnake[B](implicit encoder: SnakePickle.Writer[B]): BodySerializer[B] =
   b => StringBody(SnakePickle.write(b), "utf-8", MediaType.ApplicationJson)

 def asJsonSnake[B: SnakePickle.Reader: IsOption]: ResponseAs[Either[ResponseException[String, DeserializationException], B]] =
   asString.mapWithMetadata(ResponseAs.deserializeRightWithError(deserializeJsonSnake)).showAsJson

 def deserializeJsonSnake[B: SnakePickle.Reader: IsOption]: String => Either[DeserializationException, B] = { (s: String) =>
   try
     Right(SnakePickle.read[B](JsonInput.sanitize[B].apply(s)))
   catch {
     case e: Exception => Left(new DeserializationException(e))
     case t: Throwable =>
       // in ScalaJS, ArrayIndexOutOfBoundsException exceptions are wrapped in org.scalajs.linker.runtime.UndefinedBehaviorError
       t.getCause match {
         case e: ArrayIndexOutOfBoundsException => Left(new DeserializationException(e))
         case _                                 => throw t
       }
   }
 }
}

Provided methods are basically the copy of SttpUpickeApi default methods, we just change their name, so we can explicitly provide which serialization methods we want to use in our request.

Now the only thing that is left is to change our createCompletion method, to use SnakeCase instead.

def createCompletion(
   completionsBody: CompletionsBody,
   token: String
): Request[Either[ResponseException[String, Exception], CompletionsResponse]] = {
 import sttp.openai.json.SttpUpickleApiExtension.{asJsonSnake, upickleBodySerializerSnake}

 basicRequest.auth
   .bearer(token)
   .post(uri"https://api.openai.com/v1/completions")
   .body(completionsBody)
   .response(asJsonSnake[CompletionsResponse])
}

If we run it to check the response.

Response(Right(CompletionsResponse(cmpl-799qfST6wbSbOv6Bxq0kWhEQS1Dgo,text_completion,1682417165,text-davinci-003,List(Choices(What did you learn today?,0,None,stop)),Usage(2,8,10))),...)

It works perfectly fine.

Conclusion

Json serialization is still a more or less problematic topic regardless of which library we’re going to use. I don’t say that uPickle is the best one on the market to use. To be honest, after using it for a while, I’m still not convinced to use it in other projects, mostly because of the amount of code you need to write in order for it to work. But I think that it’s good to know it since it’s a part of the Scala Toolkit. I highly recommend exploring and seeing it yourself and getting out of your circe/json4s/jsoniter comfort zone.

Reviewed by Krzysztof Ciesielski, Adam Warski, Adam Bartosik

Blog Comments powered by Disqus.