One of the greatest issues facing Google in the endeavour to accurately collect and categorise content on the internet is that there is no uniformity across the web.
We all run different platforms, different stacks and display information in different ways aesthetically and within plain code (divs, tabs…). Google have noted this in patents and highlighted that there is no single organisational structure.
The patent US9449105B1 granted to Google back in September 2016 discusses a point about using context vectors to better understand how words are being used.
To summarise, the patent describes the limitations of a flat search engine model – basing indexing purely upon content:
- Using conventional keyword searching will lead to results being returned where the word is in use, regardless of its use or context
- Depending on the number of domains “in the universe”, only the homepage of the result may be returned due to crawl/network capacity
- The list of results returned may not be returned in a meaningful or weighted manner, unless it users keyword usage as a factor or placement on the page (i.e. density, placement on page, use in headers)
- Information quality control can be lacking
Understanding context vectors and entity relationships is one thing, the next phase is how to implement this on a website in a meaningful way – and by meaningful, a way that will bring both value to users… And to the client website in terms of relevant traffic, topical relevancy (for the domain), and rankings. By architecture, I’m referring to two types:
- Actual site architecture (the sitemap, URL structures, internal linking)
- Information architecture (on-page, internal linking)
When it comes down to content, there are a lot of models out there, or people talking about building content pillars… But my favoured approach is to assess the domain as an eco-system and not silos. It’s the content eco-system that I’m going to focus on throughout the rest of this post.
This approach also has other names, and is beyond a blog or news section — and they’re not pieces of “cornerstone content”, pillar pages, or topical clusters.
Recently at the Minnesota Search Summit June 2019, Kevin Indig referred to this content architecture type as Microsites 2.0.
This approach is also an evolution on an old approach, referred to sometimes as “length is strength”, in where all content known to a topic is forced on to a single page in a big to make a single “authoritative page”.
I also strongly recommend reading Kevin’s blog post on internal linking structures.
In the above, I mention “the universe” – this is directly referred to within the patent and is represented by the patent artwork, specifically figure 2:
Paraphrasing the patent, figure two demonstrates a universe of communication that at an atomic level consists of individual words. Words/search phrases themselves may be terms that have an independent meaning, may be combined in expressions that have meaning, or combined as words to constitute a term.
For example, car is a word. Vehicle is a word. Truck is a word. Each of these has meaning. Nevertheless, a car may be an automobile driven by a individual. Likewise, a car may be a railroad car operated by only a railroad. Thus, different words have different contexts which give the individual words meaning.