A month ago I blogged about Google Flu Trends measurements of flu-related search terms and how it related to official data from the Center for Disease Control and Prevention. As it turns out, flu cases came in quite a bit below what Google was predicting.
“Flu Trends is meant to be a complementary tool to the surveillance systems used by the CDC. Since its initial launch in 2008 and through this flu season, Flu Trends has accurately predicted the start and peak time of flu season. However, this season our models estimated a higher influenza like illness rate than the Centers for Disease Control in some regions. As we do each year, we will be performing a model analysis and potential model update to improve the accuracy of the tool.”
–from the article linked below
Derrick Harris at GigaOM took a look at the disparity in an interesting piece called Google’s flu snafu and the reliability of web data. In the article, he also looks at individual reporting data or at Twitter data. His conclusion? Data collection has its flaws and, to be effective, users have to take into account the possible drawbacks of a proposed method. It’s not rocket science, but it makes a lot of sense.