Let’s talk about the cost of duplicate content. At first blush, it seems like a relatively minor issue. In principle, a search engine wants to include only one copy of a page in its index. So if you have multiple pages with the same content, the search engine picks only one. This means one copy of your content is ignored.
So far it does not sound too bad, does it? However, there are other less obvious consequences to duplicate content. For example, it can’t be good that crawlers come to your site and crawl pages that they will never index. In fact, it’s our understanding that crawlers come to your site with the goal of crawling a certain number of pages. So if they crawl pages that will not be indexed, they are not crawling pages that will. This could result in fewer pages of your site getting indexed.
In addition, there are tons of ways to end up with unintentional duplicate content. Here are just a few:
- Running an affiliate program
- Syndicating content
- Failure to 301 redirect from the non-www version of your site to the www version of your site (or vice versa)
- Code implementations that cause sub-domain pages to automatically mirror to your site
- Code implementations that lead to different URL paths to render the same content
- Pages with “different content”, but that are not different enough – this can happen with database driven sites
I am sure that there are many more ways to create duplicate content. Each of these scenarios has its own issues and problems. But one problem with nearly all of them is link dilution. Your site has a certain amount of page rank to spread around. Links that go to pages that will never be indexed are wasted. This means that less page rank is poured into those pages that are indexed. This will likely result in lower rankings for those pages.
So the bottom line is potentially fewer pages indexed and lower rankings for the pages that are indexed. This sounds like an extremely high cost to me. You can read more about problems with, and solutions for duplicate content here.