23 May How (& Why) Search Engines Render Pages via @beanstalkim
There is an interesting twist in how we think about indexing – and that is rendering.
When we think about ranking pages we generally think about indexing. This is to say, we generally think about the point in time when a search engine has:
- Discovered a page through sitemaps or crawling and has proceeded to then visit the page for indexing.
- Gathered all the content on the page.
- Started ranking the page for queries.
Arguably, this is the most important stage in the process given that this is the trigger for rankings, but it’s not the final stage of the discovery process and I would suggest that its weight will decline in time while the final stage – rendering – gains traction.
What Is the Difference Between Indexing & Rendering?
Essentially, the difference between indexing and rendering can be illustrated with these two images:
This is basically the same content, viewed as it would be during indexing (HTML) and rendering (Chrome).
Why Does This Matter?
Now, you may be asking yourself why this matters.
If you are, then I’ll assume you don’t have a JavaScript site but even if that’s true, it’s more important than you might think. The fact that search engines rendered pages prior to the recent push into JavaScript use for websites is a good confirmation.
Essentially the reason that it matters is that rendering provides the truth.
With the code, a search engine can understand what a page is about and roughly what’s going on.
With rendering, they can understand the user experience and far more about what content should take priority.
- Is content hidden behind a click?
- Does an ad fill the page?
- Is content that appears towards the bottom of the code, actually displayed towards the top or in the navigation?
- Is a page slow to load?
All these questions and many more are answered during rendering, and are important to properly understand a page and how it should be ranked.
When Does Rendering Occur?
Rendering occurs after indexing. How long after is not set in stone, but according to Gary Illyes from Google it can take several weeks.
https://twitter.com/rustybrick/status/1052582660634079234?ref_src=twsrc%5Etfw
When I asked John Mueller of Google if this timeline was still accurate today the response was:
We've been working on improving the latency for over a week now.
— 🍌 John 🍌 (@JohnMu) September 2, 2019
So, it’s something that they’re actively working on.
Bing operates differently of course, but according to their Web Ranking & Quality Project Manager, Frédéric Dubut, the timeline is roughly the same.
I would say the same – sometimes it's days, it can be weeks and in extreme cases it can also be never. Ultimately it's a trade-off between the cost of rendering the page and the value we find in rendering it.
— Frédéric Dubut (@CoperniX) September 3, 2019
So, the short answer is “after indexing” and the timeline is variable, essentially meaning that the search engines will understand the content and context of a page prior to gaining a full understanding of how it is to be prioritized.
This is not to say that they are completely ignorant until rendering.
There are some solid rules and understandings that the engines have all gained over the years that allow them to make quick assumptions about:
- What elements do.
- Where they’re positioned.
- How important they are meant to be to the user.
But it isn’t until the pages are rendered that the engines will know their assumptions are correct and that they can fully understand a page and its form.
The Problem with Rendering
Is essence, the search engines send a crawler to the site that will render the page as a browser would.
Based on its popularity, we will use Google as an example here.
Googlebot has a Web Rendering Service (WRS) component. Thankfully, this component was updated in May of 2019.
Until then, the Web Rendering Service was using Chrome version 41. While this was great for compatibility it was a nightmare for sites that relied on modern features like those in modern JavaScript.
In May 2019, the Web Rendering Service was upgraded to evergreen, meaning that it uses the most current version of Chrome for rendering (within a couple weeks at any rate).
Essentially, now when your page is rendered by Googlebot, it’s rendered more-or-less how you would see it in your browser.
Great right? Now the only testing you need to do is open a browser and if it works there it’s great for Google, right? Right?
You can probably guess the answer. Wrong.
And Bing isn’t much better (though they do seem to be a bit better at rendering which is interesting).
If you have a basic site with predictable HTML and little-to-no dynamic content, then there really isn’t anything you need to worry about and there probably wasn’t with the old Web Rendering Service setup either.
But for those with dynamic content served via JavaScript, there is a very big caveat and it’s rooted in this gap.
Namely, until the page is rendered, the engine doesn’t know what’s on it. Unlike a site with a simple HTML output where the engine might be missing a bit of the context but has the content, with a site built on something like JavaScript that relies on the rendering, the engine will not know what content is on the page until the Web Rendering Service has done its job.
Suddenly those “weeks” are pretty impactful. This is also why the engines are working to reduce the latency.
Until they do, JavaScript developers will need to rely on pre-rendering (creating a static version of each page for the engines) which is not at all ideal.
What Does a Web Rendering Service Do?
I wanted to quickly answer a question that I found myself not quite wrapping my brain around until I realized I was thinking about it entirely wrong. You are welcome to laugh at me for the obviousness of the hiccup in my brain.
First, let’s consider where a Web Rendering Services gets its instructions and how.
Here’s basically the life-cycle of rendering:
- A page is discovered via sitemap, crawler, etc.
- The page is added to the list of pages to be crawled on a site when the crawl budget is available.
- The page content is crawled and indexed.
- The page is added to the list of pages to be rendered on a site when the rendering budget is available.
- The page in rendered.
So, a critical and unspoken element of the process is the rendering queue. Googlebot may get to a page weeks before rendering it and until then some content (JavaScript sites) or context (all sites) may be missing.
When a page hits the top of the queue for rendering, the engine will send what is referred to as a headless browser to it.
This is the step I had difficulty with. A headless browser is a browser without a graphical user interface.
For some reason, I had a difficult time wrapping my brain around how that worked. Like, how is Google to know what’s there if it’s not graphically displayed?
The obvious answer is of course:
“The bot doesn’t have eyes either so … um … yeah.”
Over that mental hiccup, I came to terms with it as a “browser light” that renders the page for the search engine to now understand what appears where and how on a page – even though they don’t have eyes to see it.
When all goes well, the rendered version will appear the same to Googlebot as it does to graphical browsers and if it doesn’t then it’s likely because the page relies on an unsupported feature like a user permission request.
All In All…
I suspect that we will see the latency between indexing and rendering shrink dramatically, especially on sites that rely on it.
This won’t have a dramatic impact on most sites but for those that need to be rendered to be understood … the world may open up.
Though more likely, a new set of problems and hiccups will unfold.
Because from my experience, we can count on the indexing skills of the engines, but the rendering side still has a long way to go in bridging the gap between what the search engines see and what a user’s browser does.
Image Credits
Featured Image: Paulo Bobita
All screenshots taken by author
Horseman Image: Adobe Stock, edited by author
Sorry, the comment form is closed at this time.