Contents

From Code To Pixels. About Web Rendering

Jakub Antolak

04 Mar 2024.7 minutes read

From Code To Pixels. About Web Rendering webp image

In 2024, 8 years will pass since I started to work in IT. I began my journey as a junior UI designer in 2016 and then, after some time, I switched to a front end developer role. I was always interested in moving things on the screen, especially in the browser. Yet, only fairly recently I systematized my knowledge about how things are actually rendered there.

I knew the essentials, of course. However, what I knew intuitively but was not really aware of were 2 things: the formal division of how graphical user interfaces can be built and which model the browsers implement.

Design Patterns

So, how can we build user interfaces? Quoting a legend, there are basically 2 schools of thought. We can choose the retained or the immediate mode.

Retained Mode

In the retained mode, the system handles managing and storing all of the objects you render. It also provides a number of useful abstractions to speed up the UI creation process, like inputs, buttons, etc. In the browser, HTML is used to give the elements a structure, CSS to describe their look, and JavaScript to define their behavior (this part is optional though). Then, the system takes care of maintaining their state.

Things are pretty straightforward. To render an animated rectangle 5 pixels off the top-left corner, you write a line of HTML and a few lines of CSS:

See the Pen Browser - retained mode by Kuba (@jakub_antolak) on CodePen.

When the state (here, the background color) is updated, you don't need to redraw all the objects. Their state is retained. You can modify their behavior, but the browser still takes care about keeping the position, background color, event handling etc.

Immediate Mode

In the immediate mode, things work a bit differently. It is the developer's task to build a graphical object, set its properties, and to update them explicitly when needed. Usually, it happens inside a render loop. Some functionalities may need to be implemented from scratch. You, as a developer, are the pixel caretaker. There is no opinionated system. You are the one.

image1

Generated with ImgFlip

You might know this approach from the browser, too. Take a look at this:

See the Pen Browser - immediate mode by Kuba (@jakub_antolak) on CodePen.

We tried to implement the same animated rectangle (almost) without HTML and CSS, and our work became more complicated. There is no mechanism to handle the basics, like color conversion or calculating intermediate colors from the 2 we have put into CSS in the previous example. Also, the correct text alignment in relation to a rectangle is hard to recreate, as there is no native way to do that for us. The draw function runs on each frame, and the state is not retained. Our task is to maintain it every single time.

That's how graphics APIs operate. In the browser, the canvas element can be seen as an "enclave" of the immediate1 mode.

And if you need even more fine grained control, you can switch from 2D to WebGL, WebGL2, or WebGPU contexts (if supported).

image5

Retained-immediate hierarchy

Use Cases

In the retained mode, the system serves as a kind of wrapper that makes work faster and more flexible. Pick it when there is no time for the implementation of crucial things like layout management, accessibility features or SEO. It is the best choice for static content – things like forms, paragraphs, inputs or images.

Immediate mode shines where control and performance matter: in dynamically changing environments like games or data visualizations.

image7

GUI modes use cases

HTML, CSS, and Pixels

The browser works in the retained mode, then. But how exactly? Let's take a deeper dive into that. Since most of the Internet prefers Chrome, I chose Blink, its rendering engine, as an example.

The description below is a simplification, but it should be fine for a general overview. I will often quote an excellent source of knowledge: The Life Of A Pixel presentation by Steve Kobes. And if you wish to know more about Blink in general, I recommend reading this document.

Parsing and DOM Construction

The first thing Blink does is parsing HTML and CSS to create the DOM tree, which is Chrome's internal representation of a page. It can be modified through JavaScript thanks to the V8 engine. At the same time, CSS is parsed and the style values are computed for every element. You can inspect them when checking the "Computed" tab in the developer tools.

image3

Parsing and DOM construction scheme

Rendering Pipeline

Then, it is time for rendering.

  1. Layout (Reflow)
    The first part of it is the layout process (or reflow). It does things like:
  • Geometry computation. It determines the coordinates and sizes of all DOM elements in the document.
  • Element ordering. It places blocks of content in a specific order. Different layout types use different orders, but the most basic one, the flow layout, places them in block or inline direction.
  • Text measuring. Or rather, measuring runs of text. The term "run of text" means a sequence of characters sharing the same style and placed within the same layout block, like a paragraph with a specific font-family, font-size, etc. Text measuring is basically measuring the width and height of a text content.
  • Bounding rect calculation. Finally, the layout calculates multiple bounding rects the element may have, like a border box or the overflow rect.

It operates on a different tree, though. It's called the layout tree. There, 1 DOM node usually corresponds to 1 layout object. But that is not always the case. Nodes with display: none or span elements have no display object assigned (unless the span has some styles that affect the layout).

  1. Paint
    The paint process assigns each layout object a display item, which contains a list of paint operations. Paint operations are instructions that tell the browser how the elements should be exactly drawn.
    Paint uses stacking order, not DOM tree order. This way, the elements stack correctly when overlapping each other. You can change the stacking order by changing the z-indexof an element in CSS.

image8

Paint process scheme (simplified)

  1. Raster
    Finally, the display items are mapped into bitmaps of color values stored in the GPU memory. The raster process also decodes the image assets you put in your HTML and CSS.

image2

Rendering pipeline scheme

Optimizations

The web page remains interactive though, allowing the pipeline to dynamically adjust to changes and user interactions. To handle that, the browser does 2 things.

Animation Frame

Firstly, the renderer produces animation frames where each frame is a complete visual representation of the state at a particular point in time. Or, in other words, a result of the pipeline's execution. If you recall the canvas example a few paragraphs back, you will notice a hint of that in window.requestAnimationFrame. There it is, a render loop. While the browser primarily operates in a retained mode, it seamlessly integrates these immediate mode-like operations, providing both efficiency and control.

image6

Retained and immediate mode in the browser

Reflow & Repaint

Secondly, when a change happens, it checks which parts of the pipeline need to be updated, or – in other terms – invalidated. When doing so, it strives to reuse as much of the previous rendering outputs as possible.

But it is not always achievable. Some invalidations may be expensive, especially if they apply to large parts of the DOM. That is why unnecessary reflows and repaints are considered bad. A good practice is to use GPU-accelerated CSS properties like transform or opacity.

Compositing And Tiling

When the reflow or repaint occurs, however, the browser tries to minimize its performance impact. One of the key mechanisms here is compositing, which involves splitting the webpage into multiple layers. These layers are independently rasterized, often with GPU acceleration. The final step of compositing, where they are combined into a single cohesive image, is handled efficiently in a separate thread to ensure smooth rendering and interaction.

An example might be a scrolling event, where the content layer moves on the user's action while the other is clipping it.

image4
Compositing scheme

Moreover, because it might be expensive to rasterize large layers at once, Blink will split them into tiles. This means only the visible parts of the content will be rasterized, enhancing performance.

There are plenty of other mechanisms that the browser implements to serve the content swiftly. If you are even more curious, feel free to check the documents linked above.

Summary

On a daily basis, we rather do not care about how browsers work. When we click on something on a web page or move a slider, we see things happening instantly. Front end developers embrace the basics necessary to perform their daily tasks, such as using browser APIs or interacting with the DOM. But even they don't go deeper more often than is required.

Discovering the processes under the hood might not only be insightful, but also useful for choosing correct design patterns, architectures, and making optimizations in new or existing projects. And, if you are geeky enough, it can provide lots of fun as well!

All graphs were created using Excalidraw

Reviewed by: Tomasz Krawczyk, Marek Hückmann


  1. We could also think about SVG as an immediate mode element. It often requires manual updates for interactivity, just like the canvas does. However, all elements inside of it are valid DOM nodes, so the state is retained there as well. 

Blog Comments powered by Disqus.