Skip to main content

SEO

How Google’s Search Results Work: Crawling, Indexing, and Ranking

31

Do you know how search engines like Google find, crawl, and rank the trillions of web pages out there in order to serve up the results you see when you type in a query?
While the details of the process are actually quite complex, knowing the (non-technical) basics of crawling, indexing and ranking can put you well on your way to better understanding the methods behind a search engine optimization strategy.

A Massive Undertaking

At the time of writing, Google says it knows of more than 130 trillion pages on the web. In actuality, it’s probably far more than that number. There are many pages that Google keeps out of the crawling, indexing and ranking process for various reasons.
In order to keep its results as relevant as possible for its users, search engines like Google have a well-defined process for identifying the best web pages for any given search query. And this process evolves over time as it works to make search results even better.
Basically, we’re trying to answer the question: “How Do Google Search Results Work?”. In a nutshell, this process involves the following steps:

  1. Crawling – Following links to discover the most important pages on the web
  2. Indexing – Storing information about all the retrieved pages for later retrieval
  3. Ranking – Determining what each page is about, and how it should rank for relevant queries

[Tweet “Getting a page into search results involves 3 steps: crawling, indexing, and ranking. Learn more at”]
Let’s look closer at a simplified explanation of each …

Crawling the Web

Search engines have crawlers (aka spiders) that “crawl” the World Wide Web to discover pages that exist in order to help identify the best web pages to be evaluated for a query. The method of travel by which the crawlers travel are website links.
These website links bind together pages in a website and websites across the web, and in doing so, create a pathway for the crawlers to reach the trillions of interconnected website pages that exist.
How about a visual example? In the figure below, you can see a screenshot of the home page of USA.gov:

Whenever crawlers look at a web page, they look through the “Document Object Model” (or “DOM”) of the page to see what’s on it. The DOM is the rendered HTML and Javascript code of the page that the crawlers look through to find links to other pages (samples shown above in red outlines). This allows the search engine to discover new pages on the web, and each of the new links they find are loaded in a queue which the crawler will visit at a later time.
Crawling the entire web each day would be too big of an undertaking, so Google typically spreads its crawl over a number of weeks. In addition, as mentioned earlier, search engines like Google don’t crawl each and every web page that exists.
Instead, they start with a trusted set of websites that serve as the basis for determining how other websites measure up, and by following the links they see on the pages they visit, they expand their crawl across the web.

Indexing the Data

Indexing is the act of adding information about a web page to a search engine’s index. The index is a collection of web pages—a database—that includes information on the pages crawled by search engine spiders.
The index catalogs and organizes:

  • Detailed data on the nature of the content and topical relevance of each web page
  • A map of all the pages that each page links to
  • The clickable (anchor) text of any links
  • Other information about links, such as if they are ads or not, where they are located on the page, and other aspects of the context of the link and what that implies about the page receiving the link… and more.

The index is the database with which search engines like Google store and retrieves data when a user types a query into the search engine. Before it decides which web pages to show from the index and in what order, search engines apply algorithms to help rank those web pages.
[Tweet “A search engine’s index is actually a database with many data points about a web page. Learn more at”]

Ranking the Results

In order to serve up results to the search engine’s end-user, search engines must perform some critical steps:

  1. Interpreting the intent of the user query
  2. Identifying web pages in the index related to the query
  3. Ranking and returning those web pages in order of relevance and importance

This is one of the major areas where search engine optimization comes in. Effective SEO helps influence the relevance and importance of those web pages for related queries.
So, what does relevance and importance mean, anyway?

  • Relevance: The degree to which the content on a web page matches the intent of the searcher (intent is what searchers are trying to accomplish with that search, which is no small undertaking for search engines—or SEOs—to figure out).
  • Importance: Web pages are considered more important the more they are cited elsewhere (think of these citations as a vote of confidence for that web page). Traditionally, this comes in the form of links from other websites to that web page, but there could be other factors that come into play as well.

In order to accomplish the task of assigning relevance and importance, the search engines have complex algorithms designed to take into account hundreds of signals that help determine the relevance and importance of any given web page.
[Tweet “Search engines rank pages according to their relevance and importance. Learn more at”]
These algorithms often change as search engines work to improve their methods of serving up the best results to their users. And even though they are constantly being tweaked, some of the fundamentals of what the search engines are looking for are pretty well understood.
Though we’ll probably never know the complete list of signals that search engines like Google use in their algorithms (that’s a closely guarded secret and for good reason, lest spammers use that knowledge to game the system), the search engines have revealed some of the basics through knowledge sharing with the web publishing community, and we can use that knowledge to create lasting SEO strategies.

How Search Engines Evaluate Content

As part of the ranking process, a search engine needs to be able to understand the nature of the content of each web page it crawls. In fact, Google puts a lot of weight on the content of a web page as a ranking signal.
In 2016, Google confirmed what many of us already believed: content is among the Top 3 ranking factors for web pages.
In order to understand what the page is about, search engines analyze the words and phrases that appear on it, and then build a map of that data, known as a “semantic map”—which helps define the relationship between the concepts on a web page.
[Tweet “Search engines build a semantic map to understand and evaluate page content. Learn more at”]

What Search Engines Can “See” on a Web Page

In order to evaluate content, search engines parse the data found on a web page to make sense of it. Since search engines are software programs, they “see” web pages very differently than we do.
Search engine crawlers see web pages in the form of the DOM (as we defined it above). As a human, if you’re trying to see what the search engines see, one thing you can do is look at the source code of the page. To do this you can start by right-clicking on the web page in your browser.

This will show you the source code of the web page, which might look like this:

The difference between this and the DOM is that we don’t see the effect of the execution of the Javascript, but as a human we can still use it to learn a lot about the content of the page. The body content on a web page can often be found in the source code. Here’s an example of some of the unique content on the web page from above in the HTML code:
Page Content in HTML Source Code
In addition to the unique content on the page, there are other elements on a web page that search engine crawlers find that help the search engines understand what the page is about.
This includes things like:

  • The web page’s metadata, including the title tag and meta description tag, found in the HTML code. Though not readily viewable on the web page that humans see, these tags serve as the title and description of the web page in the search results, and should be maintained by website owners.
  • The alt attributes for images on a web page. These are descriptions that website owners should maintain to describe what the image is about. Since search engines can’t “see” images, this helps them better understand the content on the page, and also serves an important role for those with disabilities who use screen-reading programs to describe the content on a web page. Learn more about web accessibility.

What Search Engines Cannot “See” on a Web Page

It’s important to understand the elements on a web page that search engines can’t see as well, so that you can help tailor your website’s content to help crawlers better understand it.
[Tweet “It’s as important to understand what search engines can NOT see on your pages as it is to know what they can see.” quote=”It’s as important to understand what search engines can NOT see on your pages as it is to know what they can see.”]
We already mentioned images, and how alt attributes help crawlers understand what those images are about. Other elements that cannot be seen by search engines include:
Flash files: Google has said that it can extract some information from Adobe Flash files, but it’s difficult because Flash is a pictorial medium. When designers use Flash to design websites, they typically don’t insert text that would help explain what the files are about. Many designers have moved to HTML5 as an alternative to Adobe Flash that’s search engine friendly.
Audio and video: Just like images, it’s hard for search engines to understand what audio or video is about without context. There are some exceptions where search engines can extract limited data in the ID3 tags within Mp3 files, for example. This is one of the reasons many publishers have accompanied audio and video with transcripts on a web page to help give search engines more context.
Content contained within a program: This includes AJAX and other forms of JavaScript methods that dynamically load the content on a web page. Google has said that it does a pretty good job at reading Javascript today, but it still has limitations. Some of these are defined in this article by Barry Schwartz quoting some statements by Google’s John  Mueller.You can also learn more about what Javascript Google can see in this article by Adam Audette. It’s probably fair to say that Google does execute most Javascript, but there are cases where it can still run into problems based on how it’s implemented.
iframes: An iframe tag is typically used to embed content from elsewhere on your own website into the current web page, or to embed content from another site into your web page. Google may not treat this content as if it is part of your page, especially if it’s being sourced from a third party web site. Historically, Google has ignored content within an iframe, but there may be cases that are an exception to that general rule.

Learn more! Understanding Google’s 2018 Search Updates

Summary

At face value, search engines seem so simple: type a query into the search box, and poof! Your results await. But this instant gratification is powered by a complex set of processes behind the scenes that help identify the most relevant data to the end user, so she can do things like find a recipe, research a product or get an answer to her question.
Why should you care?
Knowing the fundamental principles of crawling, indexing and ranking helps website owners better tune their sites to make it easy for search engines to read and understand, and to better target them to the right search results.
Need help with fine-tuning your site for better search engine results? Here’s how we do SEO at Perficient Digital.

Art of SEO Series

This post is part of our Art of SEO series. Here are other posts in the series you might enjoy:

Thoughts on “How Google’s Search Results Work: Crawling, Indexing, and Ranking”

  1. I’d say it’s a separate thing. Caching is generally a process used to speed up delivery, but wouldn’t impact the nature of the results delivered.

  2. Hello. Thank you so much Eric. Actually i wanted to revitalize my knowledge as i did little bit of seo work in past but have forget now since i joined as project manager in a logo design company.

  3. This is an excellent tutorial (overview) of Google and the mechanisms they employ to rank the web for any keyword the user enters. Well done, I now have it bookmarked. Two points related to search engine optimization, first, Google has always loved high-quality content, but now Rankbrain, the days of mindless 1,000-word articles laced with the keyword being ranked, are over. Which means websites that don’t make the user smarter, will be losing their relevance. Take away, upgrade your content quality and see your rankings get a nice lift. Second, Google is loving big sites. Just run a site:seorankeragency.com search and start clicking through the indexed pages, all 14,700 of them. As a result, I have 1,400 keywords on the first page with nearly 400 in the first position! Keep up the good work!

  4. Thank you Eric, article was really helpful t get a deeper understanding of crawling. I didn’t know google crawlers had a name and how they worked. Well, what If I target one focus keyword from multiple blog post using that focus keyword as root and other as longer tail keywords?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Eric Enge

Eric Enge is part of the Digital Marketing practice at Perficient. He designs studies and produces industry-related research to help prove, debunk, or evolve assumptions about digital marketing practices and their value. Eric is a writer, blogger, researcher, teacher, and keynote speaker and panelist at major industry conferences. Partnering with several other experts, Eric served as the lead author of The Art of SEO.

More from this Author

Categories
Follow Us
TwitterLinkedinFacebookYoutubeInstagram