Senior SEO Consultant

Share
  • 0
  • 0

There’s a lot of articles out there talking about optimising your website for Google RankBrain, and some agencies have even dedicated money to “artificial intelligence optimisation” departments. Don’t get me wrong, understanding AI is one thing, optimising for it however is another (not a) thing altogether.

Can you optimise for Google RankBrain, Machine Learning, and Artificial Intelligence?

TL;DR: The answer is no, but as with all of SEO – here comes the but… Whilst we can’t directly optimise for RankBrain and Google’s AI algorithms, we can understand how it works and make sure our practices bear this in mind (as a long term, future-proofing of your SEO campaign).

How RankBrain Works

RankBrain is an AI-based system Google began using in earnest in 2016 to understand how webpages are related to concepts and topics. It allows Google to more effectively (and frequently) return relevant webpages even if they don’t contain the exact words used in a search query. This is achieved by understanding how the page and it’s content is related to other words & concepts.

RankBrain requires a database of relationships, and vectors of known relationships between similar queries, to pull back a best guess. Inference occurs when the queries are not understood, but the results returned are still based on that data. In a 2015 Bloomberg article, Greg Corrado was quoted that:

If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.

RankBrain was introduced to the world via a Bloomberg article in October 2015, despite being live since April 2015. In the article, RankBrain was described as:

RankBrain uses artificial intelligence to embed vast amounts of written language into mathematical entities — called vectors — that the computer can understand. If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.

Also, as explained on The Next Web:

“RankBrain converts the textual contents of search queries into ‘word vectors,’ also known as ‘distributed representations,’ each of which has a unique coordinate address in mathematical space. Vectors close to each other in this space correspond to linguistic similarity.”

RankBrain has since been called Google’s third most important ranking signal (behind content and links), but this is were a lot of confusion entered the mix as it’s not really a ranking signal in the traditional sense of things. But how we talk about ranking signals and ranking factors can often lead to misconceptions and myths, that wrongly influence strategies. When we describe RankBrain as a ranking signal, we need to remember Google’s top-line on what RankBrain actually is:

RankBrain lets us understand queries better.

We also know that RankBrain is available in a number of core languages outside of English, these include Russian and Hindi (Hindi was confirmed by Gary Illyes on Twitter a couple of years ago, I can’t find the tweet to source the reference… I was in Rome at the time).

How this is related to neural matching

Neural matching is an AI-based system Google began using in 2018 primarily to understand how words are related to concepts. It’s like a super-synonym system. Synonyms are words that are closely related to other words.

Neural matching can help in instances where people are searching for an answer – by returning returning results containing the answer (and not just the question), for example:

 User Query  Can also return relevant results for…
 why does the ty look funny?  what is the soap opera effect, soap opera effect 
 what are the lines in the sky left by planes? what are contrails, contrails or vapor trails, contrails conspriacy theories
 who scored in the 1966 world cup final?  geoff hurst, martin peters, helmut haller, wolfgang weber 

How RankBrain DOESN’T Work

There’s a lot of theories and blog posts out there talking about how to optimise for Google RankBrain, Machine Learning and Artificial Intelligence but there are some things that are theories, that may have real world correlations with results – but are not directly relational. These myths are:

  • Dwell time on the result itself
  • CTR (click through rate from the SERP)
  • Bounce rate
  • Pogo-sticking

The first two were directly referred to as “made up crap” by Googler Gary Illyes on a Reddit AMA in February 2019:

RankBrain is a PR-sexy machine learning ranking component that uses historical search data to predict what would a user most likely click on for a previously unseen query. It is a really cool piece of engineering that saved our butts countless times whenever traditional algos were like, e.g. “oh look a “not” in the query string! let’s ignore the hell out of it!”, but it’s generally just relying on (sometimes) months old data about what happened on the results page itself, not on the landing page. Dwell time, CTR, whatever Fishkin’s new theory is, those are generally made up crap. Search is much more simple than people think.

And the other two are there by inference, as they relate heavily to the first two (in terms of theory). This pretty much also means that Brian Dean’s “Definitive” Guide to RankBrain is also wrong. Whilst the theory and arguments are presented well, the guide that has almost 9,000 shares is fundamentally flawed.

The notion we can reverse engineer RankBrain

When RankBrain was launched, it was said to affect ~15% of queries, however Google did go on record and specify that it’s not limited to any particular set of queries.

This means as SEOs, we can’t reverse engineer it to see how it applies (or is weighted) to certain verticals, topics, or search intents.

Is RankBrain a Natural Language Processor? (NLP)

It’s also important to highlight that RankBrain is not an NLP (Natural Language Processor). Being an NLP is an at present, unobtainable level of AI.

This is where a computer has the ability to understand and deconstruct complete sentences (and search queries), and determine the intent of the sentence/query based on the sentence structure and linguistics.

So rather than looking at search queries and words, and attempting to parse them and understand semantics (like a traditional NLP), it instead converts them into numerical forms and plots them on a multi-dimensional chart (think multiple axis).

Toy illustration of the three principles of unsupervised MT. A) Two monolingual datasets. B) Initialization. C) Language modelling. D) Back-translation (Lample et al., 2018).

While RankBrain is a step closer toward that ultimate goal, RankBrain can’t infer meaning from your searches based on language alone.

Optimising for RankBrain

Content, links, and technical are in essence all variables we have a degree of control over – but RankBrain works as an independent algorithm, learning from the datasets we give it (i.e. the content we publish on the internet), so in order to optimise for RankBrain you need to:

  • Make sure your content is optimised for high levels of user value through both correct keyword usage and matching intent
  • Make sure your content is structured correctly and presents itself in a meaningful way for users
  • Make sure your content is accurate

And being honest, if you’re not already doing this and you need to set-up your own AI optimisation division, what the hell are you doing anyway?!

It’s also fair to say in this instance, this piece of content published on Neil Patel’s blog actually summarises how RankBrain works pretty well, and explains datasets and machine learning in a digestible format.

Share
  • 0
  • 0
  • Alejandro Hernandez

    So based on our understanding, could a secondary effect of rank brain be better intent alignment for ambiguous queries that have multiple meanings but align within these vectors?

    • Dan Taylor

      Pretty much, yes. Because of this it’s pretty much impossible to optimise for as it’s taking learning from a big dataset (the entire internet) and learning what is and isn’t associated/appropriate with what.