When implemented tactically and properly, NoIndex tags can be a great boost for your site. But using them with a broad stroke on a massive scale can create more problems than it solves. On today’s episode of Here’s Why, Mark and Eric explain why you must be disciplined with your NoIndex implementation.
Don’t miss a single episode of Here’s Why. Click the subscribe button below to be notified via email each time a new video is published.
Links Mentioned:
- SEO Nightmare: When NoIndex Goes Bad
- All of our “Here’s Why” Videos
- Subscribe to our YouTube Channel
Full Transcript:
Mark: This is your website.
Eric: And this is your website with a rogue no-index tag.
Mark: In this episode of “Here’s Why with Mark and Eric”, Eric Enge will explain to you why no-index tags aren’t always the best quick fix for certain major site problems.
Mark: So Eric, what kind of sites might have a problem that would tempt them to use the no-index tag on a massive scale?
Eric: Well sometimes sites with huge page counts, we’re talking in the millions or hundreds of millions, run into big problems that place them in danger of getting a Google Penalty. Maybe they’re trying to maximize opportunities to rank for long-tail terms. Or they’re trying to add lots of refinements to improve the user experience.
Sometimes it’s the matter of a tagging system out of control or the site just has bugs that create lots of pages unintentionally. Here’s the problem: any of those can cause your site to have so many pages that are very similar to each other that put your site in danger of a Google Panda or manual penalty for thin content.
Mark: So couldn’t you just no-index all the pages that are virtually duplicates?
Eric: That might seem like the most direct solution, but it can cause more problems than it solves.
Mark: Well, how so?
Eric: Well, for one thing it can dilute your page rank focus. When this is handled properly, the links in the list point to pages that are very closely related and highly relevant to the page the links are on and worth indexation, as shown here. Now chances are every page of your site is going to have some links that point to your home page, your “about us” page, your “contact us” page, privacy policy, and other less topically focused pages such as those.
Don’t get me wrong, those are an important part of your site’s structure, so having them there is a good thing. The problem starts when some of the links in the product list are pages that are not worthy of being indexed. You can solve the penalty related problems with the no-index tag, but you end up wasting some of that page rank. Here’s an example page to illustrate that problem. In this example, 20% of our topically relevant links to key money pages are pointing to a no-index page. This page rank is basically completely wasted.
Why? Let’s take a look at what happens on a no-index page. Some of the page rank is consumed by the no-index page itself, and even though the no-index page will pass the rest of that page rank to other pages via links, the great majority of those links are going to pages which are not your money pages, as shown in the previous example. Another problem is you may be passing your page rank into never-never land. Wasting the page ranks to key money pages is bad enough, but it’s not the only problem. On very large sites, you can have the situation where Google does NOT crawl your entire site as shown here.
As shown in this image, Google reaches a point where the crawling stops. It simply has decided that there are too many pages on it for it to go any further. Yet the pages at the bottom of the tree where the crawling stops are still passing page rank that go to other pages where Google has not and will not crawl. That page rank is effectively passed into never-never land and is wasted as well.
A third problem with just using the no-index tag to solve these problems is that it can chew up your crawl bandwidth. Google still crawls with pages with the no-index tag on the page. If you have a large percentage of pages on your site that are no-index, Google will spend time crawling those pages instead of crawling pages that it might actually rank for you.
Mark: So if the no-index tag is not the best solution for the types of site problems you listed, what should a webmaster in those situations do?
Eric: Well sometimes you just have to bite the bullet. You’re going to have to dig into your site architecture and clean up the mess by hand, but doing that work can bring you enormous benefits. A significant development effort will be necessary in order to fix the problems and straighten out the whole situation. However, my experience is that the rewards for doing this justify the effort.
Mark: Well, thanks Eric for your valuable advice. You want to know more about this topic? Check out the link below to Eric’s in-depth article about rogue no-index tags, and also be sure to our videos using the link at the end of this episode. Join us again next time for another episode of “Here’s Why with Mark and Eric.”
Awesome Info Eric. Did you mean to say that we should either be pointing the non existent page to a some page which is relevant to it and also manually digging in the server logs of a website and removing them from website itself? I was just confused with your words in video. Can you suggest more here, Please! Thanks
Nice tips Eric!
I’ve seen many SEO using the noindex as panacea to resolve duplicate content issues, without realising that the pages still need to be crawled, hence wasting valuable crawl budget.
What’s your take on blocking low value pages from within the robots.txt file?
It’s a trade off between crawl budget (preserved by robots.txt) and PageRank conservation (NoIndex is better for that). I also think that NoIndex is faster at getting pages out of the index, as robots.txt might not cause the page to leave the index at all.
I agree. Robots.txt isn’t isn’t as efficient as meta noindex in terms of removing pages from the index.
In my experience low value pages rarely accrue backlinks, hence why I don’t mind losing a bit of PageRank in order to make big crawl budget savings.
There is also a combination of disallow and noindex that can be applied directly into the robots.txt file but haven’t tested it. Do you have any experience on this?
Here’s a read on this: https://www.deepcrawl.com/knowledge/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/
We are in the middle of doing a test of the robots.txt NoIndex tag to see how well they work. Should have something to report in a couple of weeks!
Note that NoIndex is not only about preserving PageRank from external links, but also from internal links.
Hi Eric, my site is not that big, I have about 194 posts. I manually added my sitemap files to Webmaster tools a while back and 600 pages were indexed, this is not including the images sitemap. I put a noindex on all my tags. This dropped my pages indexed down to 230. I seemed to have cleaned up duplicate content that the tags were generating, am I doing the right thing?
Hi Erik, there is no way to be 100% sure without doing an actual audit of your site, but it sounds like your going in the right direction. That said, check and make sure your home page and your blog category pages are in the index, as well as the majority of your articles, If that’s the case, you should be in pretty good shape.