Key Insights from Google's Search Algorithm Leak

Johannes C. 1 Tallied Votes 194 Views Share

Find out how the recent leak of Google’s internal documents affects SEO strategies: All key takeaways and the most important implications for optimizing your content in light of the leak.

google-leak-seo-takeaways.jpg

It’s been one week since news about leaked API documents providing insights into Google’s search algorithm surfaced, and the SEO community has been buzzing like hungry bees trying to extract every bit of valuable information. Google confirmed the authenticity of the documents but also stressed that they are “out-of-context, outdated, or incomplete.” Nonetheless, it is worthwhile for everyone in SEO to examine what the leaked data implies. I’ve taken the time to go through all the details and in this article, I’ll share with you the key takeaways and potential implications for SEO, along with a theory about the overall dynamics of searching and finding on the internet supported by this leak.

But first, let’s quickly sum up how the Google leak came about and why it seems to contradict earlier statements by Google about the workings of its algorithm.

Google Leak: What has happened?

Although Google has published and regularly updates guidelines on how to rank on their search engine, the true workings of its algorithm remain a black box, and the entire SEO industry revolves around trying to decipher what’s going on inside. This is why everyone is so excited about this leak, which involved over 2,500 pages of API documentation. It revealed various aspects of Google’s data collection practices and internal systems related to search rankings and was first reported by SEO experts Rand Fishkin and Mike King on May 27, 2024. However, the documents were initially made public on GitHub two months prior – on March 27 – and removed on May 7, but had already been captured by archive bots. The leaked data included details about ranking factors, data storage practices, and user interaction metrics, although it did not specify the weight or usage of these factors in the actual ranking algorithm.

The documents also contradict several public statements made by Google representatives over the years, particularly regarding the use of clickstream data and the existence of systems like "NavBoost" and "Glue," which employ user click data to influence search rankings.

Key-Takeaways for SEO

The list of implications from the Google leak is extensive. While much of it is speculative due to Google's lack of commentary, I’ve distilled the most critical takeaways for SEO to help you optimize your content effectively:

  • There are over 14,000 ranking features detailed in the API documentation.
  • Crucial ranking factors are content quality, link quality, link diversity, and user interactions.
  • Click data (good clicks, bad clicks, long clicks) can influence search rankings.
  • Building a strong, recognizable brand is critical for SEO success.
  • Chrome browser clickstreams are used to gather extensive interaction data.
  • Google has systems to demote pages based on various spam signals and poor user experiences.
  • Google measures siteAuthority, contradicting previous denials of using domain authority-like metrics.
  • Domain registration information and site age can impact ranking, with new sites potentially being sandboxed.
  • Font size and prominence of terms and links (e.g. bold text) matter.
  • Homepage PageRank and trust levels influence how new pages are initially treated.
  • Whitelists exist for queries related to topics such as Covid-19 and elections.
  • Google uses site embeddings to measure topical relevance and consistency.
  • Metrics like content originality scores and keyword stuffing scores impact rankings.
  • Token limits exist, emphasizing the importance of placing key information early.
  • There is no character count for metadata, debunking common SEO myths about optimal lengths.
  • High-quality, user-focused content and gaining diverse, authoritative links remain key strategies.

There’s a lot to unpack here. Before we dig deeper into SEO recommendations, let’s focus on once crucial takeaway: “Building a strong, recognizable brand is critical for SEO success.” This has been theorized extensively and, in my opinion, clearly benefits mainly companies specializing in affiliate advertising, which isn't necessarily good for the broader market.

Why does the Rolling Stone write about Air Purifiers for Pets?

Yes, building a brand is a good way to achieve SEO success, but you know what works even better? Buying a brand. Because, apparently, as soon as a brand is established, the owners will be rewarded by the Google algorithm, no matter what. Let’s look at the story of HouseFresh to illustrate this point:

HouseFresh is a site that specializes in product testing. They physically buy, test, and review every product featured on their site. As clearly stated on their site, they earn revenue from affiliate deals. Thus, their success largely depends on how well their content performs on Google. They strictly follow Google’s E-E-A-T guidelines (Experience, Expertise, Authoritativeness, Trustworthiness) for product reviews, so one might expect them to perform well. However, HouseFresh struggles to gain visibility and is consistently outranked by content on big media brands' pages, sometimes written by people who have obviously never seen the product. In two highly recommended blog articles (read them here and here), they describe their struggle for visibility on Google and unveil the questionable tactics of their competitors, backed up by convincing data.

In a nutshell, digital media companies buy and accumulate failing print publications with big brand names (e.g., Rolling Stone, Forbes, Popular Science). They then close print, fire most or all of the journalistic staff, and replace them with content writers who produce marketing content far off the brand’s original topics. Worse, this marketing content is usually disguised as an independent product test, though the testing seems superficial at best and, according to HouseFresh’s analysis, tends to recommend the most expensive product over the best one (since higher-priced products yield higher affiliate commissions).

The huge boost that brand names receive for all their content seems already unfair enough—why should Rolling Stone even rank for non-music-related product reviews? Moreover, it incentivizes the complete destruction and commercialization of what used to be journalism. However, to completely overrun their competitors, the corporate conglomerates behind some of the biggest magazine brands of all time (often controlling many unrelated brands simultaneously) have developed an SEO tactic called ‘keyword swarming’: identifying valuable keywords or topics where small sites have established a presence and systematically publishing a large volume of content on those topics across multiple sites owned by the same parent company. The goal is to overwhelm and push down the rankings of smaller sites' content by increasing the perceived authority and relevance of their own content in the eyes of search engines. Hence, the strategy aims to drown out competitors by leveraging extensive publishing capabilities and network.

Keyword swarming is allegedly employed by Dotdash Meredith, a digital media company that unites dozens of brands like People’s Magazine, Treehugger, InStyle, Lifewire, Investopedia, Southern Living, Liquor.com, and many more. Consumer society has conditioned a fair part of the population to trust brand names, and the strategies employed by these big online publishers apparently aim to abuse this trust by selling overpriced and sometimes nonfunctional products while outranking competitors who provide genuinely helpful reviews. We can only hope that Google will find a way to tie the boost for brands closer to their core business and does not continue to reward a cookie-brand that writes about construction work, so to speak.

Implications for SEO

The case of HouseFresh shows that affiliate marketing is not dead yet but threatens to drown in a sea of shiny spam. Regarding the Google leak and what you can do to strengthen your site's rankings, the biggest takeaway is closely related to the story you just heard: build your brand. The leak, along with countless real-life examples, suggests that a strong brand is the best way to guarantee good and stable rankings for your content. This means having a brand presence across social media networks and fostering a growing community of people who share certain interests around your content and brand is crucial. Strengthening your brand (even for small and local businesses) establishes a recognizable identity and trust with your audience and fosters customer loyalty and long-term success.

In terms of content creation, the key implications can be summarized as follows:

  1. Produce appealing, original content and follow E-E-A-T-guidelines.
  2. Use bold text and other ways of visual emphasis wisely.
  3. In information-management, stick to the iceberg-technique.

In conclusion, while the recent leak sheds some light on why certain content outranks its competitors, Google’s algorithm remains a black box that can make or break a business. SEO experts must continue to test, tinker, and interpret data as much as possible to stay ahead of the game. As far as we know, the next update to the search algorithm could, once more, change everything. And who knows – maybe even the monopoly of Google’s search engine will start to crumble one day, and we’ll wonder about the inner workings of DuckDuckGo or some other competitor. For now, I hope you found this article insightful and would love to hear your opinion in the comments!

Chris Hüneke 5 Expert in AI-powered Digital Marketing

Nice read!

you basically say it (link quality + user interactions) - but i'dd like to add a little somthing, if it's alright:

A backlink, which actually drives traffic, is counted.
A Backlink, which does not drive any traffic, not.

This way Google tries to filter and disavow spammy links automatically - because most Spam sites dont have any traffic.

Sounds smart and logical at first - but is it?

Isn't it easy to manipulate this throug traffic bots or even mircroworkers?

Just like the CTR is faked to manipulate rankings on long term, traffic can be generated on Spamlinks to make Google take them serious.

And this can be used for the good and the bad.

Unfortunately, what i see, is that this is mostly abused for negative SEO attacks.

Where Spamlinks, which Google normally ignores, are treated with fake traffic, until they damage the reputation and visibility of a website.

References:

https://hexdocs.pm/google_api_content_warehouse/0.4.0/GoogleApi.ContentWarehouse.V1.Model.AnchorsAnchor.html

sourceType (type: integer(), default: nil) - is to record the quality of the anchor's source page and is correlated with but not identical to the index tier of the source page.

https://hexdocs.pm/google_api_content_warehouse/0.4.0/GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefDocumentMetadata.html

totalClicks (type: number(), default: nil) - The total clicks on this document, taken from navboost data.

I'dd love to hear your opinion on this.

Kind Regards,

emiilyyjohnson2 0 Newbie Poster

Thanks for sharing such valuable information with us. It was a goldmine if we dig down it deeply.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.