Google Flu trends is an elegant application of search data to medicine. Working on Hardin MD, I’ve long noticed seasonal variations in certain diseases — Colds, flu, & respiratory illnesses peak in winter, and insect bites & sun exposure conditions peak in summer. I pay a lot of attention to the search terms that people use to get to Hardin MD pages, so Google’s mining of this data to serve the health of the community is especially interesting.
The idea of Google Flu Trends is shown nicely in the snapshot above from the animation at the Google Flu Trends site — Google finds that there is an excellent correlation between flu-related terms that people search and the occurrence of flu, as measured by CDC data. And, as shown in the animation, the Google search data in near real-time precedes CDC data, which takes 1-2 weeks to be reported and compiled.
The idea of using search data to track the progression of disease outbreaks certainly is elegant, and Google deserves congratulations for it. In choosing flu as the first example, however, Google has chosen a disease with complicating factors.
The nature of these potentially complicating factors is suggested in the graphic above from the Google Flu Trends site — A big question here is — What caused the spike in flu occurrence and flu search activity in Dec 2003 – Jan 2004?
Because Google has chosen not to reveal the exact search terms that they are using to determine the volume of searching for flu-related searching (see supplementary material accompanying Google’s paper in Nature), it’s difficult to know the cause of the 03-04 spike with certainty. But looking back at the chronology of that time period sheds light — There was a major shortage of the flu vaccine in late 2004, which is certainly related to the spike shown in the graphic — The CDC spike (yellow) shows that many people had flu, presumably because they were unable to get the vaccine. The Google spike (blue) is even higher, which may indicate that there were a significant number of people searching for flu information not because they were infected, but because they were looking for information on how to get the vaccine. The accompanying article (Flu Symptoms vs Flu Shot) shows that there is in fact a clear indication of heightened search activity for flu vaccine-related terms during the autumn pre-flu season.
The other complicating factor in looking at flu-related search activity is bird flu, and this seems to have been addressed well by Google — The large bird flu outbreak in Asia, and corresponding bird flu scare throughout the world, occurred in late 2004 and early 2005. Since there is no major spike shown in the graphs for this time, Google apparently has excluded bird flu/avian flu search terms from the aggregate group of terms it’s using.
** This is one of a group of three articles on Google Flu Trends:
- Google Flu Trends: Kudos & Complications (this article)
- Google Flu Trends: Flu Symptoms vs Flu Shot
- Google Flu Trends: The Iowa Connection
Together, these articles suggest that, although it’s difficult to know with assurance because Google has not revealed the search terms that they use for GFT, it seems likely that they’ve done a good job in working around the complications of flu-related search patterns.