Contents

Oh, no! Timeout... webp image

Have you ever struggled with your application timing out its responses?

There can be various factors that cause timeouts. They can be triggered by server misconfiguration, swollen request/response payloads, or a slow and congested network. Your system could be experiencing high traffic and be unable to process requests on time.

Another common issue is related to requests triggering long-lasting processes. If the handling of the action takes longer than the configured timeout threshold, the client will receive an error with a 408 status code. Depending on your app configuration, the process will be aborted or eventually complete. In the latter scenario, even though the action was finished the client already received an errored response and will not be notified of the success.

Sometimes the solution for timeouts might be just optimization of the backend task, like tweaking SQL queries or using more efficient algorithms. But performance tuning is not always trivial, and some processes simply have to take some time to complete.

So what can we do about it?

Person Wearing Round White Analog Watch

Photo by Marius Mann from Pexels.

Fine-tuning the configuration

If you can estimate how long it takes to complete the task, then the most straightforward remedy might be just increasing the timeout threshold for that particular request type. In most cases, it’s not enough to update the value in a single place: you have to update the client, the server and proxies (e.g. load balancers).

The problem is that we can’t always unambiguously determine the right value for the timeout limit. The backend task might require a different amount of time depending on the input data. If we choose a value that’s too small, some requests might still time out. On the other hand, setting up a value too high has many drawbacks. Your application will end up handling more simultaneous HTTP connections. This can cause larger resource consumption and server load. It could also increase the exposure to DDOS attacks like Slowloris, which depend on attackers' ability to hold open a large number of parallel connections.

In case the fine-tuning of timeouts for synchronous requests is troublesome, we can always handle tasks asynchronously.

The asynchronous handling of long-lasting tasks

If awaiting the completion of the processing causes timeouts, we could simply stop waiting.
Instead, our application could just let the client know it has successfully registered the request and will start processing soon.

First, the application should persist the intent to start the task and subsequently respond with a special status code tailored for such occasions: 202 Accepted. A background worker could then periodically fetch new tasks from the store and kick off their processing. When the job succeeds (or fails), it should update the stored entry with the result of processing or the reason for the crash. The crucial part is to make sure jobs are not started twice at the same time. We could achieve that by choosing a single leader that does processing or by utilizing some kind of locking mechanism.

The response for the initial request should include a Location header to let the client know where to look for results when they finally arrive. It usually points to another endpoint of the application. As long as the task is ongoing, the endpoint should keep returning 202 Accepted. Optionally, the body of the response could contain information on the progress. After the task finishes, the next response should return computed results with the status 200 Ok or 201 Created. In case of failure, the endpoint could return 404 or even 410 Gone.

With this flow, it’s the client's responsibility to poll the resource endpoint until the task is completed. We could give hints on how long it should wait before the next attempt with the Retry-After header. We can also improve the reactivity of our solution by pushing information from the server side. For instance, we could expose a Server-Sent Events endpoint to notify the client when the job is done.

From a software architecture point of view, this async workflow is the implementation of the inbox pattern via HTTP endpoints.

An example

an example of implementation of the inbox pattern via HTTP endpoints
Let’s take an example. Our application has 2 endpoints: POST /item endpoint to kick off the creation of the item and GET /item/{id} that returns the status and results of the job.

To make the API more robust, we could make the POST endpoint immutable. If the client makes another request with the same input data, we could simply return the location of the already-started (or even already-created) item. In our case, it’s just an URL with the same item id. As long as the task is in progress, attempts to get results yield 202 Accepted.

After the item is created, the worker process will update the entry in the items table and the endpoint will start responding with results and the 201 Created status code.

Summary

My goal with this article was to give you some insight into implementing asynchronous processing with HTTP endpoints. I hope you will find it helpful! Please check my simple example app written in Express.js that uses that flow.

Blog Comments powered by Disqus.