Wednesday, January 15, 2025
Understanding salience for better keyword classification
Semantic search and entity optimisation are not new notions in SEO, however recent Google updates, patents and studies have brought them to the forefront of popular conversation.
Throughout this post I’m going to explore how to research entities, what they mean and how they can be used to create better website structures. By website structures, I mean the ecosystem as a whole; content, template level information architecture, the website architecture, internal linking.
Why is this important to Google?
Failure to provide relevant and useful search results will drastically lead to a loss of trust by users from around the world and consequently a fall in revenue.
So it makes sense that Google is prioritizing content salience or relevance over trust and authority to a point. To rank high in search, Google must first ensure that the candidate’s content analysis results overcome a saliency threshold before it can turn to other factors — in other words, your content needs to match a number of user search intents.
Back in the good old days of search engine optimization, we obsessed with the phrase keyword density and used it as a yardstick to determine salience.
However that turned out to be very ineffective as it encouraged keyword stuffing and paid no attention to conveying useful information.
Over the years, several SEO tools emerged allowing content writers to analyze keywords based on their search frequency by using Google AdWords API and other analytics software.
Eventually, people became obsessed with writing content that tried to stuff most-searched keywords focusing more on keyword density and entirely ditching keyword relevancy, and Google retaliated by taking away the AdWords API.
Google has put in so much effort to convince us to forget about keywords and focus more on providing good quality content that people will love. It is however interesting to note that efforts being made by Google to turn unstructured data into structured data are in danger of bringing back SEOs.
What is salience?
Google’s Natural Language API is a powerful tool that compounds a lot of Google’s algorithms, so let’s have a look at what it’s all about. It’s one of a handful of SEO tools that’ll allow you to compare the content you have with the content currently ranking on Google to test whether the content is more relevant or salient.
You can also use Google image search tags to identify entities, however when performing this exercise at scale – I prefer to blend multiple data sources, and through the API (0 – 5k units a month for free), you can automate a lot of the analysis.
With Google’s NLP, you can derive insights from unstructured text and determine the salience of content, but how does this work?
Google’s NLP AI is programmed to analyze content by splitting the content into what we call “entities” Entities represent a phrase in the text that may include a person, an organization, or location.
An example of the Google Natural Language API using a paragraph from my Travel SEO Guide
Salience scores
These numbers (salience scores) are actually rankings that show the importance or centrality of that entity to the entire content or article.
Using the example picture above, the most salient entity in an article is <holiday website>,the different entities are then listed in order of salience within the text, and each entity is assigned a score between 0 and 1.
Determining the salience or relevance of a particular content piece
For many years now, Google has collected and organized millions of information on the internet, and they understand how difficult it is to file and categorize unstructured content.
For instance, if you walk into a library and request for a book on the topic “stars”, does the librarian direct you to look in the Astronomy, Astrology, or Autobiography sections?
A single topic can related to a number of different categories and fields, and consequently without more specific information about your request, the librarian could end up being bewildered and provide you with a result you’re not wanting.
Now replace the library and librarian, with Google and the search bar.
What Google’s NLP does while analyzing all the content of unstructured web pages is that it breaks down every piece of content and splits individual pieces into smaller components.
It’s logical that a website is about many things, even a web page alone is about many things too. In fact, a single sentence is also about so many things.
In the meantime, website owners want their websites optimized increase performance (traffic and conversions), so whilst this is all good, how do you turn this into something actionable?
Taking advantage of NLP salience as a measurable search factor
In March 2016 Barry Schwarz covered a story in Search Engine Land that Google’s Andrey Lipattsev had revealed that links, content and RankBrain are three more prominent signals in Google’s algorithm.
From the SEL article, we can infer that salience is a measurable factor in determining content relevancy, and therefore its likelihood to perform within competitive SERPs (assuming all other factors equal – which they’re not).
Salience can however be an asset in determining how optimized your content is in terms of user value, related entities, and potential to satisfy subsequent secondary intents as well as the primary user search intent. It can also help inform content structures and architectures for supporting content.