Search engines like Google want to ensure that the quality of the web pages that show up in their search results are high, and that search engine users will be satisfied with them. If their document (web page) analysis shows strong signals that indicate poor-quality content, it can negatively impact the rankings and visibility of those web pages online.
There are many different ways to analyze content quality, and it’s hard to know exactly what the search engines use, but in this article I’ll explore what we do know about what Google is doing in this area, as well as many other concepts that you can use to think about your own web page quality.
What We Know About How Google Evaluates Content Quality
So, what are the signals and related search updates that apply to document analysis? Besides content quality, a brief list would include reading level, keyword usage (stuffing), content “sameness,” ad density, page speed, and finally, language analysis factors.
In the end, for search engines, document analysis is about improving the search user’s experience. Let’s break it down.
The Panda Algorithm
When “content quality” is mentioned, nearly everyone involved in SEO immediately thinks of the Google search ranking algorithm update, Panda.
Since its introduction in February 2011, Panda went through several iterations before finally being folded into Google’s core search ranking algorithm nearly five years later, in January 2016.
Google’s Panda algorithm originally targeted “shallow or low-quality content” represented by the many content farms of the day, as well as websites that “scraped” (copied) others’ content or had “low levels of original content.”
Of course, Panda still takes aim at such obvious web spam by demoting search rankings, but the algorithm is much more sophisticated today. Now that Panda has become “baked” into Google’s overall search ranking algorithm, it’s become an integral part of weighing content quality factors via document analysis.
And one obvious tip-off indicating low-quality content in document analysis is poor editorial quality. Let’s look at some other factors that might go into web page quality.
Manual Review by Google’s Quality Raters
In 2001, the first version as we knew it of Google’s “Search Quality Rating Guidelines” was leaked on the web (some speculate intentionally). Google’s search evaluation team uses this internal guide to manually review and rate the quality of web pages.
The purpose of the search evaluation team and the manual is to determine what quality looks like, and in turn, use those benchmarks as a feedback loop for Google engineers responsible for developing ranking algorithms.
For example, if an engineer has an idea for tweaking the algorithm, the search evaluation team can help determine if that change has actually improved the search results.
Since that first leak of the Search Quality Rating Guidelines, several other, more current versions also surfaced on the web until Google finally made its full manual officially available to the public, starting in 2015 with several updates since.
Keyword Stuffing and “Vanilla” Content
Keyword stuffing is now so passé that it seems barely worth a mention. A relic of the AltaVista search engine of days of old, it refers to repeating the same keyword over and over on a web page to achieve higher rankings for the term.
The hangover from this tired practice is the notion of “keyword density”. Keyword density is a percentage of the occurrences of a given keyword (or phrase) on a given web page per the total number of all the words of that page. This is also an out-of-date concept.
Keyword stuffing/keyword density is an activity that’s geared towards the search engines rather than the end user. The search engines have long discouraged this practice, as it does not enhance the user experience.
Keyword stuffing, or excessive keyword density, especially when combined with little to no use of synonyms for the key phrase, is a strong indicator of low-quality content.
Similar in effect is content “sameness” or “vanilla” content that doesn’t stand out from other pages on the web. This happens when a piece of content is similar to others for the same search query, and doesn’t bring any new, original, useful or expert views to the table.
For example, writing an article on how to make French toast does not bring anything new to the web:
As you can see, there are 625,000 such pages, so it’s unlikely that your web page on how to make French toast will bring anything unique to the table.
A web page like this may have difficulty ranking, as there is likely nothing new in the content, unless you’re bringing fresh opinions on the subject that add value, or that are from an expert author users trust.
On the other side of the spectrum, a site offering content that effectively answers questions on a given topic will likely fare well.
[Tweet “Keyword density schemes and vanilla content may be the reason for low rankings. More at”]
Ad Density, Offensiveness and the Page Layout Algorithm
Search engines have learned that sites burdened by too many ads make for a poor user experience. In January 2012, Google announced its ”page layout algorithm”. With this, Google warned that it would penalize sites with pages that are laden with ads “above the fold,” and that push the actual content too far down the page.
Trying to figure out if your site is impacted? Try measuring the percentage of the portion of a page above the fold that is occupied by ads. Too high a density might be taken as a negative signal.
In its page layout blog post, Google stressed the algorithm would only affect those sites with pages with an abnormally high number of ads above the fold, relative to the web pages as a whole.
Google also has a patent on detecting annoying or otherwise offensive ads and pages, which patent specialist Bill Slawski discussed in my interview with him shortly after Google rolled out its page layout algorithm. You can learn more about how Google rejects annoying ads and pages from Slawski’s in-depth article.
[Tweet “Too many ads on a page? Just one factor Google looks at in analyzing for ranking. More at”]
Page Speed and the Mobile Connection
A website’s page speed is important to end users, and therefore, important to Google. After all, Google doesn’t want sites to show up in its results that offer a poor user experience.
In April 2010, Google announced that a website’s page load time is considered a ranking factor. At the time, Google said it affected only about 1 percent of search results. However, as a rule, websites with slow-loading pages can experience higher bounce rates and lower conversions.
Fast-forward to 2015 and the rise of mobile search: there’s a need for speed.
As part of Google’s evolving mobile-friendly initiative, Googler Gary Illyes revealed that Google would factor in mobile page speed for ranking mobile sites.
That announcement was followed by Googler John Mueller’s recommendation that mobile page loading speed should be less than two to three seconds.
And, this Google Developers file recommends that above-the-fold content render in one second or less for mobile.
However, with all this, Google still does not use page speed as a significant ranking factor. Unless your site is very slow it’s likely not impacted at all. But, it certainly impacts the user’s impression of your site in dramatic ways.
Language Analysis and User Intent
Language analysis can also help search engines analyze web pages to decipher user intent—and this is becoming increasingly important to how search rankings work today.
With language analysis, search engines like Google strive to understand the world more like people do.
Enter RankBrain, RankBrain is a machine learning system that analyzes language across the web to help Google better understand what words, phrases and blocks of content mean.
It then uses that knowledge to better understand user search queries and the user intent behind them, and then picks the best existing Google algorithms to apply to a web page and deliver a search result that best matches user intent.
One way it does this is by noticing patterns in language to determine the best results.
Consider, for example, the traditional search engine approach to ignoring stop words like “the” and negative words like “without” in a search query. Now think about how big of an impact ignoring those words could have on the search results for a query like:
“Can you get 100% score on Super Mario without walkthrough?”
Dropping the word “without” will likely result in the search engine giving the user walkthroughs to help them get a 100% score, which per the query is exactly what they don’t want.
This critical piece in Google’s approach to document analysis helps match the best web pages for any given query, and this is a good thing.
[Tweet “Google has become sophisticated at evaluating web page quality, including using language analysis.”]
Document Analysis and Reading Level
Reading level refers to the estimated “readability” of web content. A well-known, standardized formula for gauging this is the Flesch-Kincaid readability tests.
For the purposes of document analysis, the Flesch Reading Ease Test measures the average words per sentence and the average syllables per word.
The Flesch test is not a measurement of the education level required to read the content (that is the concern of the Kincaid side of the tests). Rather, it is more appropriately thought of as a measurement of the mental effort required to read a given sentence.
Should you want to check the reading level of a document, you can do so using Microsoft Word’s document readability test for content that’s already edited for spelling and grammar.
Obviously, there is no “optimal” reading level for all web pages. That will be determined by the nature of the website and the sophistication of the content it publishes.
For example, you’d write content appropriate for a younger reading level for a website targeted to children, versus content on the latest advances in artificial intelligence algorithms written for techies.
It’s not clear that Google is using this type of analysis in any manner. However, you could see how knowing the reading level of a document might help them better match a document up with a query, depending on the sophistication level evident in the query.
For example, a query like “how to calculate the payoff time of a mortgage” suggests that the user wants to understand the algorithm used, and a more technically focused article, which may have a higher grade reading level, might best serve that query.
In contrast, a query like “mortgage payoff calculator” suggests that the user simply wants to get the answer to the question, so the reading level of the best content for that user might be quite a bit lower.
Whether Google uses this type of analysis or not, it’s useful to know what your audience wants in terms of complexity, and to design your content to meet those expectations.
[Tweet “We don’t know if Google uses reading level in ranking, but you should still pay attention to it.”]
Summary
At the end of the day, search engines are concerned with delivering relevant, fresh content to their users in the results. Understanding how document analysis works and the many factors at play can help website publishers not only improve their web pages, but also their search rankings.
a great post with all useful info in a single post. Thank you Eric for sharing this article
All the above points are familiar in SEO rankings but this is i’m heared new Document Analysis and Reading Level. Now going to implement this, to check the rankings. Because past few weeks my site getting drops continuously. And another question is why you didn’t mention AMP?
We didn’t include AMP in here because it’s not a ranking factor. There are many reasons to implement AMP, but to drive rankings is not one of them.
Eric thanks for your valuable reply, That’s a great shot on web page analysis.
Hi Eric,
You are written this article in a very presentable and easy to digest manner. I am gonna read it again and again just to learn how to write like this.
And yes, Google is definitely pressing a lot on user experience, because I have experienced a ranking drop on one of my sites that had lots of ads. However, I made the necessary changes and the rankings have come back.
Thanks a lot for this comprehensive post.
Is this readability a similar metric to the one used in the Yoast plugin? It often forces me to shorten my sentences and feel that it dumbs down the content.
Not sure what Yoast uses, but you should always treat a readability score as a suggestion, not a command. The score is going to be geared toward the broadest “lowest common denominator” audience, but that may not be your audience. If your target audience is more educated, and/or expects deeper-level content, by all means don’t dumb it down!
Thanks for this information. I like it, and it gives me more knowledge about SEO and search rankings. Best work!
The more Google tweaks its algorithms the more it dilutes its own purpose and provides inferior search results. People look for information! User experience, age of a website, and many other ranking factors do not matter when people are looking for information. In my niche there is a site that ranks on every keyword that has little to do with the topic but he ranks. I used to be able to type in sentences and come up with an article I may have quoted in one of my books but no more. The fact is a lot of good information is becoming invisible on the net and Google providing the best search results is becoming a myth which I am about to write an article about because in my niche and the topic I write about I need to inform my audience that when using Google search they will not get the best information or search results because of all of the decisions Google puts into its algorithms. Google has put too much thought and made too many decisions into their process instead of just basing it on content. With google content itself must go through their hoops.
Hi Erika – I think that a lot of what you refer to is due to Google’s continual need to prevent people from gaming the system to earn rankings they don’t deserve. It’s an extremely tough problem to solve.
I understand that but what Google has done is to go way out of bounds. For instance back links are a large part of their ranking system. Since when is sharing an article indicative of its worth? I am a scholar and I conduct a good amount of research and I probably share 1 in 20 articles that I consult. I only provide links to certain types of information, and if I am writing a new’s article I will site my sources. My readers will share far less. On Facebook many will share articles that are funny or cute. How about Google’s ranking factors such as age of the site, length of articles, if the site is mobile friendly, the speed of the website, if the site is uploading daily, and also measuring how long people stay on the site. (User experience) None of the has to do with content and if Google went by content alone it would automatically knock out the sites that are ad heavy, spam sites and overloaded with keywords and no one would have the ability to game the system. The problem is that no one has questioned Google and they have done as they pleased and in my niche have butchered the results on their first pages. IBM merged with Blekko which was a search engine based on content. It will be interesting to see what Watson comes up with, I have written IBM to see what is in the works and it is my hopes is that Google search will be replaced by a true content based search engine. At the moment sites with great information can literally become invisible on the web if their niche falls under popular search terms based on Google’s algorithms which are not content focused.
thanks for this information. I like it, and it gives me more knowledge about SEO and search engine keywords rankings.