Skip to main content

SEO

Google Penguin 4.0: Living With A Kinder, Gentler Penguin

Particle Wave

Has our beloved (or more often, feared) Penguin come in from the cold?
On September 23, 2016, after an over-two-year wait, Google finally released the much-anticipated Penguin 4.0 update. They had promised that this update would be different from all previous updates, and it’s looking like that promise was kept. Not only that, but 4.0 would be “the update to end all updates,” at least from a public view.
But before we dig into how much of a sea change this new Penguin might be, let’s review what the previous updates were like, if only so we can evaluate how much better 4.0 might be. (If you want to skip straight to my analysis of Penguin 4.0, click here.)

Penguin in the Past

Penguin was first launched in April 2012, with the primary mission of combating link spam at scale. In the early days of the new web economy, links had become valuable commodities, and were bought, sold, and traded almost with impunity. Such link schemes threatened to undermine Google’s entire business, which still considers links to be a strong ranking factor. Anything that threatens that assumption can negatively impact the value of Google’s search results for users.
Penguin was designed as an algorithm that would allow Google to detect and penalize sites that appeared to be engaging in manipulative link schemes at a much larger scale than ever before. As such, it needed to be updated and improved over time. Here is a list of the major known Penguin updates before 4.0:

  1. (1.0) April 24, 2012
  2. May 25, 2012
  3. October 5, 2012
  4. (2.0) May 22, 2013
  5. October 4, 2013
  6. (3.0) October to December 2014

What was old Penguin like?

All of the Penguin updates up until 4.0 took the same basic approach, and shared certain characteristics.
For one thing, old Penguin evaluated the link profile of sites as a whole, and if too many bad links were found, the entire site got a ranking penalty. For that reason, getting hit with Penguin almost always used to mean a huge traffic hit.
Google Penguin 1-3 penalized entire sites
To make matters worse for site owners hit by Penguin, updates were very infrequent, and only at the time of an update could a site that had cleaned up its links experience any recovery. This made the final stretch of nearly two years from the Penguin 3.0 update almost unbearable for many site owners.
But Google told us that time was necessary, because they were cooking up a whole new Penguin. And indeed they were!

A New Penguin in Town

So what’s different about Penguin 4.0? Plenty! Google had promised that 4.0 would be “real time” and that once it was out, there would never again be an announced update. I’ll discuss the significance of both those statements below.
First, Penguin 4.0 adjusts the weighting of links to a site “on the fly.” That is, as Google’s bots crawl the web and discover links, the Penguin algorithm evaluates them and then stores away its judgment. That provides a reservoir of data from which Penguin can draw to make snap judgments on a web page.
Penguin adjusts link weights on the fly as links are discovered
[Tweet “Penguin 4 adjusts the weight of new links on the fly. Learn more at” ]
Second, Penguin is now more “granular.” Google’s Gary Illyes said, “Penguin now devalues spam by adjusting ranking based on spam signals, rather than affecting ranking of the whole site.”
This means that when Penguin discounts links now, it only impacts the ranking of the particular page, or pages, on a site that stood to benefit from the spammy link(s), rather than the whole site.
[Tweet “Penguin 4 is more granular, only affecting individual pages with bad backlinks. Learn more at”]
Third, Penguin is now “real time,” but what does that mean? It does not mean that ranking changes happen to pages instantaneously when new links connect to them. Other ranking changes based on links don’t work that way, and neither does Penguin 4.0. Here’s how it works:
Just as they always have, Google’s “spiders” crawl the web, discovering new links as they do. Google updates its database of link signals, but no ranking changes are applied yet.
Google crawls the web and discovers new links
When your page is re-crawled by Google, link signals from any links to the page discovered since the last time Google crawled that page are applied, and new page ranking weights are calculated at that time. Now is when Penguin has its effect (if any needed). If any suspicious links were picked up from crawls of sites outside your own, Penguin has already devalued them, and at this point the consequences of that may affect the ranking of the page. Bear in mind, with the new Penguin, this means that they will have less positive impact
Google applies ranking signals of new links when it recrawls a page
[Tweet “Penguin 4 is said to be real time. Find out what that means at”]

How Is Penguin 4.0 Real Time?

So in what way is Penguin 4.0 real-time if ranking changes are only applied at the time of a recrawl? In at least two ways:

  1. Penguin 4.0 is “more real-time” than previous Penguins because sites don’t have to wait until a major update or refresh of the algorithm to see effects (positive or negative). This is good news for sites with Penguin-penalized pages, as any successful remediation only has to wait for the next crawl (not up to two years as before!)
  2. Penguin 4.0 is continually updated “on the fly.” Changes and updates to the algorithm are now made without necessity of an entire update. These changes will be seamless and largely invisible to us.

From here on out, updates to the Penguin algorithm will:

  • Address new link types
  • Adjust ranking weights
  • Improve the process of collecting link signals

So just as there is a slight delay between any new links to a page and the application of ranking signals from those links, there will be a slight lag between the rollout of any changes to the Penguin algorithm and their effect on a particular site.
How Penguin 4.0 algo changes are applied

Speculation: Disavow Files May Help Train Penguin

Allow me to indulge in a bit of speculation for a moment. It should be obvious that it took a lot of data analysis to bring Penguin to the point where it could be trusted as a “real time” part of the ranking algorithm. In my Virtual Keynote interview with Google Webmaster Trends Analyst Gary Illyes, Gary told me that the reason Penguin 4.0 was taking so long to release was a priority on “getting it right.” They had to be sure they had built the algorithm to be as accurate as possible. But where did the data come from to provide the training set for the update?
One possibility is that Google used, and continues to use, data from links disavowed by webmasters attempting to recover from a link-based penalty. Many believe that those links in aggregate provide a richly detailed portrait of what constitutes spammy links.
Google may use data from disavow files to train Penguin
In my view, Google does NOT use disavow files in this manner. As a publisher, I have received so many idiotic link removal requests over the years. At one point, even Matt Cutts reported getting link removal requests for his own blog! Disavow files are probably even worse. As a result, I have no faith that disavow files would have usable data for assessing link quality. Garbage-in, garbage-out, as they say.
[Tweet “Does Google use webmaster-submitted disavow files to train Penguin?”]

How to Respond to Penguin 4.0

So what should you as a site owner do about Penguin 4.0? Nothing.
OK, I’m being a bit facetious there, but some of the major differences with Penguin 4.0 really do change the game. For one thing, since 4.0 only discounts links to your site, there is in principle no huge downside. It will be as if the links never existed in the first place.
Also, there is no way to file for a reconsideration of a Penguin penalty. Here is what Gary Illyes had to say:

Absolutely no reconsideration request can help you with Penguin. Reconsideration requests are for manual actions, and you can only file one if you have an incident filed internally. Penguin doesn’t create an incident; it never did and I’m very certain it never will. Reconsideration requests do not and will not help with Penguin.

So what can you really do?

  • Earn and/or build better links. Since Penguin now simply devalues bad links instead of putting your page or site in jail, getting better quality links to the page should improve its rankings, just as always.
  • Prune bad links regularly. 
    • Use tools such as Bing WMT, Open Site Explorer, Majestic, Ahrefs, and Google Search Console to get your link profile.
    • Build a list of all the backlinks to your site.
    • Categorize the link sources:
      • Blogs
      • Multi-link pages
      • Rich anchor text
      • Comment links
      • Multi-link sites (e.g., directories)
    • Analyze and identify potential bad links.
    • Submit the bad links to Google’s Disavow Tool.
  • Don’t forget about manual penalties. A large number of bad links or other practices that go against Google’s Webmaster Guidelines could result in a manual action against your site. If that happens, you’ll have a lot of hard work ahead of you to clean up the problems, and your site won’t have any hope of getting restored in rankings until your reconsideration request is approved. This is the best reason to continue to perform the regular link auditing recommended in the step above!

One more thing: What about negative SEO?
Ever since Google first introduced Penguin 1.0, webmasters have worried about the possibility of “negative SEO.” Negative SEO is the practice of intentionally pointing lots of bad links at a competitor’s site in the hope of triggering a penalty for them. Does this actually work? Once again, Gary Illyes:

So the thing about negative SEO is that, to this date, I haven’t seen a single…well, not just me, but also the ranking team hasn’t seen a single case where it was really negative SEO. It was more about clients not revealing details to the SEO who was doing the cleanup, for example.

Whether or not negative SEO ever was really a “thing,” it seems to me that Penguin 4.0 largely (if not completely) removes the SEO-related incentive for it. Any bad links you build to your competitor would simply be discounted. In other words, they do nothing, and the competitor’s page is no worse off than it was before.
[Tweet “Does Penguin 4 make negative SEO a thing of the past? Find out at”]

Conclusion

Penguin 4.0 really is a bird of a different feather. For most webmasters, it will fade into the misty seas of the overall ranking factors, and it won’t be worth any anxiety. On the other hand, wise site owners will follow the positive steps I outlined above, but then those should be part of any good SEO practice anyway.

Slide Deck Version of this post:

Thoughts on “Google Penguin 4.0: Living With A Kinder, Gentler Penguin”

  1. Nice over view Eric. I would be interested to hear of any manual link penalties since penguin 4.0 as to me if links are devalued then it seems strange to have to continue to update the disavow file? It’s best practice maybe and maybe it still helps with trust signals but could also be a waste of time in reality.

  2. Hi Matt – for many people you might be right, but I still think that checking your backlink profile from time to time and seeing if anything bad has started to materialize there is a good idea. Then, if you see too many bad links showing up, make updates to your disavow file with those links.

  3. One thing about negative SEO that the new Penguin does not fix (and can’t fix either): The building of links to non-canonicalized “fake URLs” that actually does respond with a HTTP200.
    That’s a very possible way to do negative SEO by creating humongous amounts of duplicate content on the site.

  4. Agreed, and another reaosn to monitor the backlink profile of your site. AND, also a reason to make sure that you have strict rules for what will resolve into a web page and what won’t. A common instance of this is not enforcing case in your URLs. E.g. your URL is: http://www.yourdomain.com/aboutus.html and someone links to http://www.yourdomain.com/AboutUs.html. This is solvable by one single line of coded in your .htaccess file. There are many other ways this crops up.

  5. Hi Eric. This is solvable for SEO guys, but most of websites have a lot of technical SEO issues such as case-sensitive URLs, incorrect HTTP response codes (for non-existent pages), etc, – all this problems can be used by competitors.
    And I agree with Thomas, that developers often don’t know about actual SEO trends and that is why a lot of clients’ websites have rankings problems.

  6. Hi Eric,
    Many of our articles have numerous inbound links from spammy type .xyz domains. Many of the websites look similar. An SEO technician I know says I probably don’t need to remove them, what do you think?
    I hope you are doing well. I enoy your posts.
    Brian

  7. Hard to be sure without looking at them, but if they are coming from a relatively small number of domains, I’d go ahead and disavow those domains. Not much work to get rid of the risk.

  8. Thanks for the excellent run through Eric. When doing analysis into a lot of sites I’ve seen a strong correlation between traffic increases in January of 2016 and sites that were previously hit by Penguin in April of 2012. I’ve noticed that Google referred to an update around this time as a ‘core algo update’, but denied it was Penguin related. If it wasn’t Penguin related, then it sure was coincidental.

  9. I don’t think I fully grasp the case sensitive issue. Wouldn’t you want /aboutus.html and /AboutUs.html to go to direct to the same file. If someone misspells URLs for example, instead of giving them a 404.
    Whats the correct solution here?

  10. Apache (Linux) based web servers treat /aboutus.html and /AboutUs.html as different file names, so they DON’T return the same file. You can solve this by implementing an instruction in your .htaccess file that makes the case of the URLs NOT matter. That is what I was getting at in my comment, so I agree with you! I.e., you do want them to go the same file, but in Apache, you have to write a rule in .htaccess to make that the way is works. The rule basically redirects all variants of uppercase letters to lowercase letters, and that fixes the problem.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Eric Enge

Eric Enge is part of the Digital Marketing practice at Perficient. He designs studies and produces industry-related research to help prove, debunk, or evolve assumptions about digital marketing practices and their value. Eric is a writer, blogger, researcher, teacher, and keynote speaker and panelist at major industry conferences. Partnering with several other experts, Eric served as the lead author of The Art of SEO.

More from this Author

Categories
Follow Us
TwitterLinkedinFacebookYoutubeInstagram