(UPDATE 31 July 2014: We’re resharing this today as one of our “Throwback Thursday” posts on social media because the content here is still very valid and helpful. Please note that although the concepts in the infographic remain accurate, some of the numbers quoted are undoubtedly much larger now.)
Building a search engine is a very complex task. I often find myself trying to justify to people why it is that search engines can’t understand their site. They seem fixated on believing that a search engine should understand it if a human can understand it. The short answer is that with an infinite amount of time the search engine could, but the scale of the Internet makes it oh so VERY hard.
The infographic below tries to give you some sense of the scale of the problem. Please note that a few numbers are hard to truly pin down, but I pulled them from the best sources I could. For example, Matt Cutts mentioned to me at Pubcon 2012 that Google knew of more than 100 Trillion web pages, and I bet that has grown a lot since then, or the average web page size.
Regardless, the message is the same either way. The web is a really complex place!
See the Search Complexity Infographic at Full size