Skip to main content

Research and Studies

Post Google-Twitter Launch: How is Google Indexing Twitter Today?

As seen in TechCrunch

As seen in TechCrunch


Get caught up! We have updated this study since this was published. See the latest version: Google Indexing to Tweets Appears to Decline.
Back on February 4th, 2015, the news broke on a new deal between Google and Twitter, and on May 19th the the new deal went live. Back on February 10, 2015, we took a snapshot of how Google was indexing tweets before the deal went into effect. Today, we are releasing data on how Google is currently indexing tweets, now that the deal has been live for a number of weeks.
TL;DR, we see significant increases in indexation of tweets by Google, but Google is a long, long way from indexing all tweets. Google is remaining selective in what they are choosing to index, and it definitely skews towards people with higher follower count or “authority” (we used Followerwonk‘s Social Authority as a measure of authority).
Google now indexes a lot more of Twitter, but not all!
Indexation of tweets in the first 7 days increased from about 0.6% in February to 3.4% in June. That’s a whopping 466% increase, but still leaves more than 96% of Tweets out of Google’s index. By no means do I think that this will be the end of the story. I would bet that Google is testing many things with Twitter integration, and that we will see changes over time. Not to worry, we will repeat our tests on an ongoing basis!
In this video from our Here’s Why with Mark and Eric series, Mark Traphagen asks me to explain why I think Google still doesn’t index all tweets, even though Google now has full access to those tweets:

The Sub-Story

It’s easy for us to all believe that Google captures all the data to be found everywhere on the web. After all, they have the best infrastructure for data capture on planet Earth. However, that does not mean that they don’t have limits. They do, and they need to be selective. Even with this new Twitter deal where they get all of Twitter’s tweets by firehose, it’s just too much for them to swallow and index it all.
That does not mean that their indexation rate won’t expand over time. It may well do so, but it will only do so after they find an effective use for that additional data.

Show Me the Details!

One of the most interesting areas to explore is how quickly Google indexes tweets. People have long believed that Google places more weight on recency of tweets. For that reason, we evaluated the indexation of tweets by day for the first 7 days. That leaves the question as to how this changed between February and June, and here is your detailed answer:
Increases in Twitter Indexation Within the First 7 Days
There is clear evidence here that Google has significantly picked up their level of indexation, with an increase of 466%. This is a big deal, and probably brings a lot of incremental value to Twitter. However, Google is still NOT indexing 96.6% of the data. Note also that Google’s indexation of Twitter does go up over longer period times, to about 12% of all tweets tested – still leaving 88% not indexed.
We also looked at indexation based on follower count. Both February and June show a strong bias towards indexing content tweeted by people with larger follower counts:
Twitter Indexation in Google by Follower Count - June 2015
 
Note that the time horizon used for this June data slice was 7 weeks, so the older tweets from that sample are from prior to when Google turned the switch on for the new deal with Twitter, so the increase levels are somewhat dampened by that.
We also took a look at the data based on Followerwonk Social Authority, to see how that might vary:
How Twitter Indexation Varies by Social Authority
We believe that using Social Authority is a better metric for us to use going forward, as it takes into account the engagement level with a person’s tweets (which a simple count of followers doesn’t). In this view, you can see a strong skew towards indexing the content from higher authority people.
This suggests that Google is looking at more than simple follower count to pick out what tweets they want to index.
Study shows higher authority Twitter accounts much more likely to have tweets indexed by Google.

Methodology

In this study we used a fixed user set. The data sample of people used was 900+ users that were the same ones used both in February and June of this year. Note that we also tested that exact same sample of users in a Twitter indexation study we ran in July of 2014).
Using the exact same user set is important, as we do not know what criteria Google may use to evaluate whether or not to index tweets. However, by using the exact same user set, we are trying to eliminate some of those variables.

Summary

As noted during the TL;DR at the beginning, Google’s indexation of Twitter has taken a significant jump upwards, perhaps as much as 4.66 times. That’s significant, but they are still clearly not indexing the great majority of tweets.
I expect to see significant changes in the way Google uses Twitter data over time, and we will continue to monitor that here at Perficient Digital.
See all our SEO and Social Media studies!

Thoughts on “Post Google-Twitter Launch: How is Google Indexing Twitter Today?”

  1. This is an interesting study, and very useful.
    My first assumption was that Google has no limits, and can index any and all tweets very quickly, given that they now have “firehose” access (well..not entirely the case as the study shows).
    This study points to the importance of authority and trust, which is a common denominator in social media overall. It’s unlikely that Google will index a lot of the spam content on Twitter, so there’s perhaps a guaranteed 12% to 13% of tweets that will never get indexed, simply because of their content.
    This brings me to 2 interesting questions (at least for me):
    1. Is it possible that as Google indexes more tweets, its algorithm begins to “learn” what’s useful to index and what isn’t? (they probably already have this). If so, would it be possible to get more exposure in Google’s “tweet” index by deliberately optimizing tweets (almost going back to gaming the system)?
    I guess this could be used both ways, for ethical business purposes, as well as for less desirable (spammy) purposes.
    2. How does this firehose affect security of particular tweets, people who post these tweets and people who RT or even have these tweets in their feed? Will most of Google’s information be monitored, etc? One could argue that there’s already a degree of this happening, however a firehose to twitter content does make Google more valuable in terms of having access to realtime social media activity, which can be played back, analysed and selectively indexed (or de-indexed)
    Do you think Google will eventually have firehose access to Facebook content? That would be epic.
    All in all, interesting times.
    Great post 🙂

  2. My bet is that no deal for Facebook content will happen. FB and G are deadly enemies, and they both want it all, so I don’t see any coopetition in the works there.
    Back to your assumption, my belief is that Google DOES have limits, and this is a major driver in why the indexation is so low at this point. They make choices all the time. You can see that going back to their decision to stop looking at rel=author tags last year. You can also prove this with large sites where they routinely don’t index all of the pages.
    Of course, if they start to see real value in something, they will go index it.

  3. Thanks for this analysis. It’s welcome news.
    Perhaps missing is the amount of relevant data in that 100-140 character post – “…find an effective use for that additional data” is an interesting phrase.
    In an online world where Google values long blog posts of 2000 words (which may be as much or more than 16,000 characters) – it may be that a high percentage of this firehose content has actually no value, or can’t be evaluated.
    When we are seeing over 3% of the data indexed, we may be seeing the most valuable part of it. That ratio is common in other industries (paid advertising has a much worse ratio, overall – percent of data people find valuable enough to click at.)
    Your point of using social authority as a metric seems closer to what we are told is how Google values content overall, especially news sites and blogs. With the high percentage of fake accounts (eg. some celebrity-politicians having 50-70% fake followers or more) – The whole twitter index scene could probably max out at 20%, but more than likely much lower. (Pareto principle applied to itself says that about 4% is the really key material of any data stream.)
    Thanks again for this great report.

  4. Google has limits? You just shattered my dreams.
    I like the analysis.
    I guess the real question is how many tweets actually have value to search results? I skimmed my personal twitter account, and well the unless you are in my circle of friends, the tweets would have zero value to you. Heck you might even lose a few brain cells reading them.
    With roughly over 500 million tweets per day, I am guessing the vast majority are worthless.
    Personally, I would rather not have my personal search results polluted with tweets.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Eric Enge

Eric Enge is part of the Digital Marketing practice at Perficient. He designs studies and produces industry-related research to help prove, debunk, or evolve assumptions about digital marketing practices and their value. Eric is a writer, blogger, researcher, teacher, and keynote speaker and panelist at major industry conferences. Partnering with several other experts, Eric served as the lead author of The Art of SEO.

More from this Author

Follow Us