As a part of the IMEC Labs Test Group, we have been running some tests to see whether or not tweeting a link to a page that is otherwise invisible to Google can cause it to be crawled and indexed. The short story is this is: Yes, it might. [Tweet This!]
In our two test samples, one of the pages did get crawled and indexed, and the other one did not. This means that you can potentially use Twitter to get your page discovered and indexed by Google. However, there is no guarantee that it will. To understand why this is significant, and how we tested it, please read on!
The Short Story
We had 3 total pages (all on the PerficientDigital.com domain) in the test, as follows:
Take note of the last column, as it shows how we tested the page. Basically, we had people publish tweets that included a link to the tested page, with different people being used for each of the two test pages. The goal was to see if these tweets would be enough to get the page crawled and indexed by Google. Here were the basic test results:
Note that the #singersongwriter page got crawled by Googlebot and indexed quite rapidly, but the #searchengines page was never crawled by Googlebot or indexed at all. The Control Page was never accessed by anyone at all, except by my Safari browser when I checked it after the initial upload.
So why did one page get indexed, and not the other? This table provides a summary of the differences between the two:
- Higher authority people tweeted the #singersongwriter page (we used Followerwonk Social Authority to determine this)
- Different hashtags were used for the 2 pages, though we thought the #searchengines tag was more likely to get picked up by Google
- Google indexed two of the tweets from the #singersongwriter group
- Different pages related to the tweets were indexed by Google from Twitter profiles, the actual hashtag page, and tweet replicating sites
One of the pages that Google indexed related to the #singersongwriter page was hub.uberflip.com, a site that has a Moz Domain Authority of 64. Perhaps that trigged the indexation, but that was not in the equation until well after the page was already indexed.
Which one of these factors triggered the crawl and indexation? Given the extremely short time frame between the first tweet, and the first crawl by Googlebot, we believe it’s extremely likely that Google saw the tweet on Twitter first, before it saw it on any 3rd party site (hub.uberflip.com or any other tweet replicating site).
Even if this was not the driving factor, you can still conclude that exposing the world to a new web page for the first time by a tweet can lead to its getting crawled and indexed.
The Long Story
This section goes into much greater detail on our methodology, as well as the data collected during the test. The basic concept of this test was as follows:
- We wrote 3 brand new articles for the test.
- We uploaded them by FTP to this web site, PerficientDigital.com.
- In other words, we did not upload this through WordPress, which is the CMS for PerficientDigital.com.
- No links were implemented to the new web pages.
- After uploading them, I checked them out using Safari. I chose this browser as there was no Google toolbar installed in my Safari browser.
- One of the 3 test files was ignored after that, to act as a control to verify that our procedures outlined in the steps above were execute correctly.
These mechanics were used in order to minimize the chances of Google discovering the pages by any means other than our test. Once this was done, we asked a small number of IMEC panelists to participate in the test by tweeting a link to the web page. They were emailed and sent to this page for instructions. Then some of these panelists executed tweets, such as this one by Rand:
After the emails requesting the tweets, we monitored the process for a period of 8 days to see what would happen. This also involved a number of steps:
- We saw which of our participants tweeted as requested, and logged when their tweets occurred (we also kept links to each of their actual tweets)
- We checked every day to see if the pages got indexed. We did this using a query such as [songwriter site:PerficientDigital.com] (without the ), so that the act of executing the query would not inform Google about the existence of the page.
- We checked the log files for PerficientDigital.com to see what various user agents had come to the site. We looked at every single user agent to see if there were any “corrupting influences” prior to Googlebot first arriving at the page.
- We monitored Open Site Explorer and Majestic SEO to see if the pages received any external links.
- We monitored Google itself to see what related to the test it was indexing using queries such as this one: [“This is a test by IMEC” singersongwriter] (without the ).
All of this monitoring was done daily to make sure we could verify that the test was as airtight as possible. Next, I will take a look at some of the details for the first test page (the #singersongwriter page), the one that got indexed. To start, here are all the log file accesses up until Googlebot’s first visit:
The Visitor Type column shows my notes on who the visitor to the page was, based on their user agent. If you look at this in detail you will see 5 different types of user agents:
- Browser User Agents – note that the first Safari access is by me after I uploaded the files to the Perficient Digital server
- Twitterbot, pretty much immediately after the first tweet
- Twitter scraper/replicators, such as Tweetmemebot, Tweetminster, InAGist, etc.
- Flipboard and GetPrismatic are in there, probably as a result of plugins in the IE browser that accessed the page seconds before they arrived
- Googlebot, 7:39 seconds after Twitterbot first arrived at the site.
Of the 6 people that tweeted the #singersongwriter page, 2 of their tweets were indexed. However, at the time that Googlebot first arrived at the site, only one of the tweets had been sent out, and that person’s tweet was not one of the ones that got indexed. In
addition, that person’s Social Authority was actually the lowest one in that test group (they had a Followerwonk Social Authority of 50). Now isn’t that fun to think about?
What about the other sites/pages that Google indexed that referenced the tweets? Here is a table that shows what we found there:
There are a number of differences between the pages that Google indexed that showed the text related to the test tweets in some fashion. Perhaps the most significant one was hub.uberflip.com because of its moz Domain Authority of 64. However, Googlebot had been to the @singersongwriter page long before hub.uberflip.com had any pages indexed.
We know this because Googlebot had already been at the #singersongwriter page within 7:39 of the first tweet, and that person’s tweet was never picked up by hub.uberflip.com. In fact, the tweets that were picked up by hub.uberflip.com were still more than an hour away from being tweeted at the time Googlebot made its first visit.
In summary, we believe it’s almost certainly the case that Google saw the initial tweet on Twitter and it caused that first visit by Googlebot to the #singersongwriter page. Given the Followeronk Social Authority level of 54, this was not triggered by the highest authority people that tweeted that page.
Even if Google did happen to first see the page on a site that replicates tweets, it still does show that it’s possible for you to help get a page initially crawled, and then indexed through Twitter promotion only, even when that page has no initial links to it.
Thanks to the IMEC board (Rand Fishkin, Mark Traphagen, Cyrus Shepard, and David Minchala, and the entire group of IMEC participants! And, for completeness, here is my Twitter Handle.