Skip to content

Brief notes on how browsers work

Published: at 00:00reading time9 min readSuggest Changes

Intro

I have found this great article How browsers work on web.dev recently explaining how browsers work in detail. I’m using this blog post to help me make notes and somewhere that I can quickly refresh the memory. If you are working on Frontend-related stuff, I strongly recommend this article as it will help you understand browsers and how they work in the background and interact with HTML, CSS, and JavaScript.

Table of contents

Open Table of contents

Role

There are many browsers out there today and you are probably heard most of them: Chrome, Firefox, Edge, and Safari etc. The main function of a browser is to present whatever resources you have requested from a server. Generally it is a HTML document (with stylesheets and JavaScript), but with browser extensions it can render files in different format like PDF.

Architecture

Browser architecture
Figure: Browser’s high level structure (source: How browsers work)

Let’s go through each component from top to bottom (then left to right) in the image:

The rest of the article focusing mainly on the rendering engine.

Rendering engine

The rendering engine chapter contains detail about how the engine display HTML, CSS and images. Different browsers use different engines:

Main flow

Main flow of the rendering engine
Figure: Rendering engine main flow (source)

A short summary of the flow:

Note that different rendering engine may have slightly different terminology for each step in the flow.

Parsing

Parsers translates the document into a parse tree (representing the structure) based on syntax rules of the document.

There are two processes within parsing:

The following image is an example of breakdown the expression 2 + 3 - 1 into a parse tree:

Parse 2 + 3 - 1

The parsing process is iterative - tokens are feed into the parser one by one and then try to find a match for the current token according to the grammar - vocabulary and syntax rules. If no rules are matched an exception will be raised, else add the token to the tree as a new node.

For more details on parsing refer to the parsing section.

HTML parser

A HTML parser is responsible for parsing the HTML markup into a parse tree. Refer to W3C HTML specification for HTML’s vocabulary and syntax rules.

The conventional parsers (discussed above) do not apply to HTML but they do work for CSS and JavaScript, this is because the grammar for HTML is not context-free grammar, and that’s why HTML still can render the document even if you missed some opening or closing tags. Given the amount of HTML pages around the world and the flexibility they have, it is extremely difficult and inconvenience to come up a CFG for HTML to use in browsers. In addition, this behaviour also means HTML can not be parsed using regular top bottom or bottom top parsers. See this section and beyond to know more details of how browser parse HTML into a DOM tree.

CSS parser

CSS can be parsed using context free grammar parsers, since there are already well defined rule sets for CSS.

Order of processing scripts and stylesheets

Scripts (with the <script> tag) will be executed as soon as it is reached, if it is hosted somewhere else then it will be fetched from the network. The parsing process halts until the script is executed. There are two ways to prevent the halting behaviour:

  1. Set defer on the script, in which the script will be executed after parsing.
  2. Mark the script async in HTML so it would get parsed and executed on a different thread.

Webkit and Firefox does an optimisation called speculative parsing where parsing and script execution are separated (off-load) to different threads for overall performance. This optimisation is only performed on external resources and doesn’t modify the DOM tree.

We can manipulate CSS using JavaScript therefore styles must be ready during the document parsing stage. Different browser has different strategy:

Render tree

The render tree (what you see in the browser inspector) is constructed alongside the DOM tree, it is a visual representation of the document which allows the browser to draw the elements defined in the document, and calculate the drawing position for each element depending on the page size. This means any changes to any attribute of the HTML elements will trigger the browser to re-render the entire DOM tree, which is a very expensive and slow operation (CPU-intensive) if a lot of updates are required.

Frameworks

To avoid re-render everything in the DOM tree, many notable frontend libraries and frameworks are introduced and trying to solve this problem:


The relationship between the render tree and the DOM tree is not one-to-one, e.g. elements with display: none will not appear in the render tree.

Frames (Firefox) or renderer (WebKit) knows how to lay out and paint itself and its children, this is because each renderer represents a rectangular area corresponding to a node’s CSS box. WebKit has a special class RenderObject for the renderer.

Style computation is a big topic, refer to the following sections:

Developer have put many efforts in style data sharing, style computation, and applying rules. Firefox have two trees for managing style context and style computation, where one is for rules, and the other one divides style contexts into structs containing styles for a particular category, e.g. div. Children of these structs inherits styles and only contains styles that are not in the parent node - saving memory as well as easy to construct a path from bottom of the tree all the way to the top.

Layout

Layout/reflow is the calculation of position and size for renderer when it is added to the tree.

HTML uses a flow based layout model which can compute these values in almost one iteration, though HTML tables can require more iterations.

Layout can be proceed left-to-right and top-to-bottom. The coordinate system is relative to the root frame and starts from top left (you might find this familar if you have used D3.JS before).

The process starts from the root renderer (html tag) and iteratively perform calculation for all renderers.

Painting

This stage uses the UI infrastructure component and traverse the render tree and call the paint() method of renderer for display content on the screen.


Previous Post
What makes a web application resilient?
Next Post
Asynchronous task scheduling for the Music AI project