Google to protect the anchor text signal from the influence of spam sites
6 mins read

Google to protect the anchor text signal from the influence of spam sites

In a session of the Google SEO clinic, Google’s Duy Nguyen from the search quality team answered a question about links on spam sites and how trust is related to it.

It was interesting how the googler said they would protect the anchor text signal. It’s not something that is commonly discussed.

Building trust with Google is an important aspect for many publishers and SEOs.

There’s the idea that “trust” helps a website get indexed and ranked properly.

It’s also well known that there is no “trust” metric, which sometimes confuses some in the search community.

How can an algorithm be trusted if it doesn’t measure anything?

Googlers don’t actually answer this question, but there are patents and research papers that give an idea.

Google does not trust links from spam websites

The person who posted a question to the SEO office asked:

“If a domain is penalized, does it affect the outbound links?”

Googler Duy Nguyen replied:

“I assume that by ‘penalize’ you mean that the domain has been downgraded by our spam algorithms or manual actions.

Generally yes, we do not trust links from sites that we know are spam.

This helps us maintain the quality of our anchor signals.”

trust and links

Googlers speak of trust and it is clear that they speak of their algorithms trusting something or not trusting something.

In this case, it’s not about not counting links on spam sites, especially not about not counting the anchor text signal.

The SEO community talks about “building trust,” but in this case, it’s really about not building spam.

How does Google determine a website is spam?

Not every website is penalized or given a manual action. Some websites aren’t even indexed, and that’s the job of Google’s Spam Brain, an AI platform that analyzes web pages at various points, starting with crawl time.

The Spam Brain Platform works as:

  • Indicative Gatekeeper
    Spam Brain blocks websites at crawl time, including content discovered via Search Console and sitemaps.
  • Hunts indexed spam
    Spam Brain also catches spam that has been indexed at the time sites are considered for ranking.

The Spam Brain platform works by training an AI with Google’s knowledge of spam.

Google commented on how Spam Brain works:

“By combining our in-depth knowledge of spam with AI, we were able last year to develop our own anti-spam AI, which is incredibly effective at spotting both known and emerging spam trends.”

We don’t know what”knowledge of spam” Google speaks, but there are various patents and research papers about it.

Those who want to delve deeper into this topic should read an article I wrote about the concept of link distance ranking algorithms, a method of ranking links.

I’ve also published a comprehensive article on several research papers describing link-related algorithms that may describe what the Penguin algorithm is.

Although many of the patents and research papers date from the last decade, nothing else has been published by search engines and university researchers since then.

The importance of these patents and research papers is that they may end up in Google’s algorithm in other forms, such as training and AI like Spam Brain.

The patent discussed in the link ranking article describes how the process assigns ranking scores for pages based on the distances between a set of trusted “homepages” and the pages they link to. The seed sites are like starting points for calculating which sites are normal and which are not (e.g. spam).

The intuition is that the further a site is from a seed site, the more likely the site can be considered spam. This part, dealing with determining spam by link distance, is covered in research papers cited in the Penguin article I referenced earlier.

The patent (Rank pages using distances in a web link chart), explained:

“The system then assigns lengths to the links based on properties of the links and properties of the pages attached to the links.

The system next calculates the shortest distances from the set of initial pages to each page in the set of pages based on the lengths of the links between the pages.

Next, based on the calculated shortest distances, the system determines a ranking score for each page in the page group.”

Reduced link graph

The same patent also mentions what is known as a reduced connection graph.

But it’s not just a patent that discusses graphene with reduced links. Reduced link graphs have also been researched outside of Google.

A link graph is like a map of the Internet made by mapping links.

In a collapsed link chart, the low-quality links and associated pages are removed.

What remains is a so-called reduced link graph.

Here is a quote from the Google patent cited above:

“A reduced link graph

Note that the links participating in the k shortest paths from the seeds to the pages represent a subgraph containing all links ordered by “flow” from the seeds.

Although this subgraph contains far fewer links than the original linkgraph, the k shortest paths from the seeds to each side in this subgraph have the same length as the paths in the original graph.

… In addition, the rank flow to each side can be traced back to the nearest k seeds through the paths in this subdiagram.”

Google does not trust links from penalized sites

It’s kind of an obvious thing that Google doesn’t trust links from penalized websites.

But sometimes you don’t know if a page has been penalized by Spam Brain or flagged as spam.

It’s a good idea to do some research to see if a site may not be trusted before attempting to get a link from a site.

In my opinion, third party metrics should not be used for such business decisions as the calculations used to create a score are hidden.

If a site already links to potentially spam sites that themselves have inbound links from potentially paid links such as PBNs (private blog networks), it is likely a spam site.

Featured image from Shutterstock/Krakenimages.com

Check out the SEO office hours: