So here is the question: Does Google scan Gmails to see URLs shared within them, and then does it use these to discover new content? There are many who adamantly maintain that they do. So the IMEC Labs Test Group decided to put that to the test. In this post, we will report our results on whether or not Google is reading your gmails to see stuff that you are linking too.
So here is your answer in a nutshell: Our tests showed no evidence of Google crawling URLs that were shared within Gmails. None. Want more detail on why we say that? Read on to get either the “Short Story”, or the “Full Story”, below.
The Short Story
We posted 4 total pages (all on the PerficientDigital.com domain) in the test, and then asked different groups of users to email links to those pages to either Mark Traphagen, Cyrus Shepard, David Minchala, or myself (these are 4 of the 5 members of the IMEC board, the 5th is Rand, you can read more about the board below). The process of sending the email only required two button clicks for them so the task was made easy. We also set it up so that the actual email content was pre-configured to make that easy as well.
We asked 20 to 22 people to send gmails sharing the links for each article to the various pages. One group was asked to share article 1, a different group was asked to share article 2, and so forth. The goal was to see if Google would spot these links in the gmails, and then crawl and index those URLs. You can see that we had somewhat uneven levels of participation. Article 1 got the most shares and article 4 the least. This was simply reflective of the people we asked to send out gmails following through at different levels. Here were the basic test results:
So as you can see, well, there was very little to see. The results were wholly unremarkable, and that’s the most remarkable thing about them!
The Long Story
This section goes into much greater detail on our methodology, as well as the data collected during the test. The basic concept of this test was as follows:
- We wrote 4 brand new articles for the test.
- These articles were mapped into hand-coded web pages that included NO Google code on it. No Google Analytics, no google plus button, etc.
- We uploaded the 4 results HTML pages by FTP to PerficientDigital.com.
- We did not upload this through WordPress, which is the CMS for PerficientDigital.com.
- No links were implemented to the new web pages.
- After uploading them, I checked them out using Safari. I chose this browser as there was no Google toolbar installed in my Safari browser.
The reason for all these mechanics was to make sure that the pages were completely unknown to Google at the start of the test. We also forbade participants in the test from visiting the web pages. This was critical to the test, as different browsers, or browser plugins, can trigger discovery of content by Google. For example, it’s known that the Google+ +1 button will call the Google+ API on a page load, and this can trigger Google crawling a page.
Various SEO toolbars that people may install in their browser can be a problem too. For example, we had one attempt at this test aborted because of a YouTube related plugin in a Firefox that caused Googlebot Mobile visits and that invalidated that attempt at the test (note that we are going to write this up in a separate article sometime soon!). However, in the final addition of the test, we were able to verify that there were no corrupt elements in the test.
When we launched the test I am reporting on today, each participant was sent to a page similar to this one. All the participants needed to do was click on one of the “Send Test Mail Option x” buttons, and this would pre-populate their gmail client with an email and then they would simply click send and that was it.The page that the test panelist went to in order to execute this looked like this one:
The test was launched on March 9th. We then monitored the pages for 12 days to see what transpired. The basic components of this monitoring were:
After the emails requesting the tweets, we monitored the process for a period of 8 days to see what would happen. This also involved a number of steps:
- We tracked the gmails sent (each gmail was sent to one of 4 of the 5 members of the IMEC board, specifically Mark Traphagen, Cyrus Shepard, David Minchala, or myself)
- We checked the log files every day to see if Googlebot (or other Google programs) visited the page. This also allowed us to monitor that all of our participants followed our instructions and did not visit the pages.
- We used search queries such as [songwriter site:PerficientDigital.com] (without the ), so that the act of executing the query would not inform Google about the existence of the page) to see if the pages appeared in the Google index.
- We monitored Open Site Explorer and Majestic SEO to see if the pages received any external links.
All of this monitoring was done daily to make sure we could verify that the test was as airtight as possible.
Ultimately, the bottom line is that Googlebot never came to any of the test pages. Not even once. In addition, all of our test participants adhered to the instructions and never visited the pages, so we know that there were no corrupting influences. In any event, any corruption would have shown itself as a Googlebot visit, and since we had none, we can be confident in the results.
There were two curiosities in the test worthy of note:
- mail.google.com did visit one of the pages. Why this happened, we do not know. However, it did not lead to a Googlebot visit, or indexation of the impacted page.
- BUbiNG bot visited two of the pages on March 15th. This is a bot implemented by the University of Milan. It is not clear how they discovered the pages visited, but it seems likely that the emails were routed via servers they are monitoring.
However, neither of these curiosities changes the essential result, which is that none of the pages were visited by Googlebot, and none of them were indexed by Google.
Thanks to the IMEC board: Rand Fishkin, Mark Traphagen, Cyrus Shepard, and David Minchala, and the entire group of IMEC participants! And, for completeness, here is my Twitter Handle.