Google On Protecting Anchor Text Signal From Spam Site Influence


In a Google SEO office hours session, Google’s Duy Nguyen of the search quality team answered a question about links on spam sites and how trust has something to do with it.

It was interesting how the Googler said they were protecting the anchor text signal. It’s not something that’s commonly discussed.

Building trust with Google is an important consideration for many publishers and SEOs.

There’s an idea that “trust” will help get a site indexed and properly ranked.

It’s also known that there is no “trust” metric, which sometimes confuses some in the search community.

How can algorithm trust if it’s not measuring something?

Googlers don’t really answer that question but there are patents and research paper that give an idea.

Google Doesn’t Trust Links From Spam Sites

The person who submitted a question to the SEO office hours asked:

“If a domain gets penalized does it affect the links that are outbound from it?”

The Googler, Duy Nguyen, answered:

“I assume by ‘penalize’ you mean that the domain was demoted by our spam algorithms or manual actions.

In general, yes, we don’t trust links from sites we know are spam.

This helps us maintain the quality of our anchor signals.”

Trust and Links

Googlers talk about trust and it’s clear that they’re talking about their algorithms trusting something or not trusting something.

In this case it’s not about not counting links that are on spam sites, in particular, this is about not counting the anchor text signal.

The SEO community talks about “building trust” but in this case, it’s really about not building spam.

How Does Google Determine a Site is Spam?

Not every site is penalized or receives a manual action. Some sites aren’t even indexed and that’s the job of Google’s Spam Brain, an AI platform that analyzes webpages at different points, beginning at crawl time.

The spam brain platform functions as:

  • Indexing Gatekeeper
    Spam Brain blocks sites at crawl time, including content that’s discovered through search console and sitemaps.
  • Hunts Down Indexed Spam
    Spam Brain also catches spam that’s been indexed at the point when sites are considered for ranking.

The way the Spam Brain platform works is that it trains an AI on the knowledge Google has about spam.

Google commented on how spam brain works:

“By combining our deep knowledge of spam with AI, last year we were able to build our very own spam-fighting AI that is incredibly effective at catching both known and new spam trends.”

We don’t know what “knowledge of spam” Google is talking about, but there are various patents and research papers about it.

Those who want to take a deep dive on this topic may consider reading an article I wrote about the concept of link distance ranking algorithms, a method for ranking links.

I also published a comprehensive article about multiple research papers that describe link related algorithms that may describe what the Penguin algorithm is.

Although many of the patents and research papers are within the last ten or so years, there haven’t really been anything else published by search engines and university researchers since.

The importance of those patents and research papers is that it’s possible that they can make it into Google’s algorithm in a different form, such as for training and AI like Spam Brain.

The patent discussed in the link distance ranking article describes how the method assigns ranking scores for pages based on the distances between the a set of trusted “seed sites” and the pages that they link to. The seed sites are like starting points for calculating what sites are normal and which sites are not (i.e. spam).

The intuition is that the further a site is from a seed site the likelier the site can be considered spammy. This part, about determining spamminess through link distance is discussed in research papers cited in the Penguin article I referenced earlier.

The patent, (Producing a Ranking for Pages Using Distances in a Web-link Graph), explains:

“The system then assigns lengths to the links based on properties of the links and properties of the pages attached to the links.

The system next computes shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages.

Next, the system determines a ranking score for each page in the set of pages based on the computed shortest distances.”

Reduced Link Graph

The same patent also mentions what’s known as a reduced link graph.

But it’s not just one patent that discusses reduced link graphs. Reduced link graphs were researched outside of Google, too.

A link graph is like a map of the Internet that is created by mapping with links.

In a reduced link graph the low quality links and associated sites are removed.

What’s left is what’s called a reduced link graph.

Here’s a quote from the above cited Google patent:

“A Reduced Link-Graph

Note that the links participating in the k shortest paths from the seeds to the pages constitute a sub-graph that includes all the links that are “flow” ranked from the seeds.

Although this sub-graph includes much less links than the original link-graph, the k shortest paths from the seeds to each page in this sub-graph have the same lengths as the paths in the original graph.

…Furthermore, the rank flow to each page can be backtracked to the nearest k seeds through the paths in this sub-graph.”

Google Doesn’t Trust Links from Penalized Sites

It’s a kind of an obvious thing that Google doesn’t trust links from penalized websites.

But sometimes one doesn’t know if a site is penalized or flagged as spam by Spam Brain.

Researching to see if a site might not be trusted is a good idea before going through the effort of trying to get a link from a site.

In my opinion, third party metrics should not be used for making business decisions like this because the calculations used to produce a score are hidden.

If a site is already linking to possibly spammy sites that themselves have inbound links from possible paid links like PBNs (private blog networks), then it’s probably a spam site.

Featured image by Shutterstock/

Watch the SEO Office Hours:


Source link

Google On Protecting Anchor Text Signal From Spam Site Influence
Scroll to top