Google CEO Eric Schmidt’s comments on health/medicine in a recent wide-ranging interview by Charlie Rose have not gotten much attention, so I’m excerpting them here. First, Schmidt discusses Google Flu Trends:

[For clarity I've mixed a few words from Rose's questions with Schmidt's comments]
There are many [positive] things that we can do with the corpus of information that’s being gathered … The most interesting thing we’ve recently done is called flu trends. We looked at … trends in terms of worldwide flu … There’s a lot of evidence, concern about a pandemic … that might occur, similar to the 1918 bird flu epidemic that killed … 50 million … a proportionately huge number if it were today. And because people, when they have a problem, search for something, we can detect uncommon searches as a blip. We can note that. In our case, we built a system which took anonymized searches so you couldn’t figure out exactly who it was, and that’s important. And we get six months ahead of the other reporting mechanisms so we could identify the outbreak. Many people believe that this device can save 10, 20, 30,000 lives every year just because the healthcare providers could get earlier and contain the outbreak. It’s an example of collective intelligence of which will are [sic] many, many more.

Later in the interview, Schmidt talks about what he calls a “public corpus of medical information”:

The Wikipedia model has been so successful. Why don’t we have all the smartest doctors organize a corpus, a public corpus of medical information … that combines everything everybody knows about medical practice in one place, a place where you can — again, this would have to be a public database where you keep pouring more experiential data, and then you can build computer systems … [Rose: So you have all your cases, everything you ever knew] Schmidt: Again, anonymized so it’s appropriately legal and all of that, and get it in one place so that people can begin to mine the data. They can actually begin to figure out what the disease trends are. What are the real health trends? And this is not a knock on the existing providers to do it. They just don’t have the scale. We are strong when we have thousands of people working in parallel to solve a really important problem. I would tell you, by the way, that if you look at the problems that society has hit over the last thousand years, start with the plague, right all of the things that really hit us that nearly destroyed society, we overcame them through technology and innovation. People figured out new ways whether it was in medicine or governance to overcome them. So let’s be positive about it. We can work those issues. There’s always a way to handle the objections if it’s important.

A New York Times article on Google Flu Trends reports that Google’s methodology “has been validated by an unrelated study” based on Yahoo! search data whose lead author is Philip Polgreen, an infectious disease doctor at the University of Iowa.

I was glad to learn about the Polgreen study, first, of course, because Polgreen and colleagues are right here at the University of Iowa! — But beyond that, it was good to find in the full article by the Polgreen team that they give more details about the flu-related search terms they used than the Google Flu Trends team does, making it easier to break down the complicating factors in flu searching. Specifically, they report that they excluded the following terms:

bird, avian, pandemic
vaccine, vaccination, shot

As discussed in accompanying articles (see below), flu is a particularly complicated disease for correlating disease occurrence and web search behavior, because of the existence of bird flu, and because there is a vaccine for flu — exactly the factors that have been excluded by Polgreen et al. It seems likely that the Google Flu Trends team is using a similar method.

Incidentally, more on the Iowa connection — Philip Polgreen has been involved for several years with the Iowa Health Prediction Market, a spin-off of the Iowa Electronic Markets, a real-money prediction market/futures market that’s used to make predictions in political elections.

** This is one of a group of three articles on Google Flu Trends:

Together, these articles suggest that, although it’s difficult to know with assurance because Google has not revealed the search terms that they use for GFT, it seems likely that they’ve done a good job in working around the complications of flu-related search patterns.

The CDC data above shows that the occurrence of flu generally peaks in February; the data below from Google insights : flu symptoms, not surprisingly, has a similar peak in February.

Google insights, which uses the same data as Google Flu Trends, shows quite a different pattern for flu shot (below), which peaks in October or November (Flu vaccine peaks similarly).

How about searching for just the word flu (below)? — Interestingly, this seems to combine the peaks in the two graphics above, for flu symptoms and flu shot. The exaggerated peaks in 2004 and 2005 likely are caused by peoples’ concerns about vaccine shortage (more on this in accompanying posting, Google Flu Trends: Kudos & Complications).

Looking at the evidence of these graphics from Google Insights, it seems likely that the Google Flu Trends team is excluding search terms relating to flu vaccine, and concentrating on terms that relate to symptoms. See confirmation of this in accompanying posting, Google Flu Trends: The Iowa Connection.

The data shown here seems to indicate that for a seasonal disease in which there is a vaccine, the search patterns for “disease: symptoms,” “disease: vaccine/shot,” and the disease term itself differ, correlated with the time in the year when the disease occurs and when the vaccine is given. This idea is confirmed by Google Insights data for pneumonia, another respiratory disease that has a vaccine. The patterns are similar to flu, with high peaks for pneumonia shot in October, and somewhat lower peaks for pneumonia and pneumonia symptoms in February.

Bronchitis — A disease with no vaccine

Bronchitis is a respiratory disease condition that does not have a vaccine. As the graphics from Google Insights below show, the pattern is different from flu and pneumonia — The peaks for the disease itself (bronchitis, below) and for the disease with symptoms is much the same, making it less complicated to track search patterns — Apparently the people who search for the disease are in fact people who have the disease.

Bronchitis symptoms, from Google Insights.

** This is one of a group of three articles on Google Flu Trends:

Together, these articles suggest that, although it’s difficult to know with assurance because Google has not revealed the search terms that they use for GFT, it seems likely that they’ve done a good job in working around the complications of flu-related search patterns.

Google Flu trends is an elegant application of search data to medicine. Working on Hardin MD, I’ve long noticed seasonal variations in certain diseases — Colds, flu, & respiratory illnesses peak in winter, and insect bites & sun exposure conditions peak in summer. I pay a lot of attention to the search terms that people use to get to Hardin MD pages, so Google’s mining of this data to serve the health of the community is especially interesting.

The idea of Google Flu Trends is shown nicely in the snapshot above from the animation at the Google Flu Trends site — Google finds that there is an excellent correlation between flu-related terms that people search and the occurrence of flu, as measured by CDC data. And, as shown in the animation, the Google search data in near real-time precedes CDC data, which takes 1-2 weeks to be reported and compiled.

Complications

The idea of using search data to track the progression of disease outbreaks certainly is elegant, and Google deserves congratulations for it. In choosing flu as the first example, however, Google has chosen a disease with complicating factors.

The nature of these potentially complicating factors is suggested in the graphic above from the Google Flu Trends site — A big question here is — What caused the spike in flu occurrence and flu search activity in Dec 2003 – Jan 2004?

Because Google has chosen not to reveal the exact search terms that they are using to determine the volume of searching for flu-related searching (see supplementary material accompanying Google’s paper in Nature), it’s difficult to know the cause of the 03-04 spike with certainty. But looking back at the chronology of that time period sheds light — There was a major shortage of the flu vaccine in late 2004, which is certainly related to the spike shown in the graphic — The CDC spike (yellow) shows that many people had flu, presumably because they were unable to get the vaccine. The Google spike (blue) is even higher, which may indicate that there were a significant number of people searching for flu information not because they were infected, but because they were looking for information on how to get the vaccine. The accompanying article (Flu Symptoms vs Flu Shot) shows that there is in fact a clear indication of heightened search activity for flu vaccine-related terms during the autumn pre-flu season.

The other complicating factor in looking at flu-related search activity is bird flu, and this seems to have been addressed well by Google — The large bird flu outbreak in Asia, and corresponding bird flu scare throughout the world, occurred in late 2004 and early 2005. Since there is no major spike shown in the graphs for this time, Google apparently has excluded bird flu/avian flu search terms from the aggregate group of terms it’s using.

** This is one of a group of three articles on Google Flu Trends:

Together, these articles suggest that, although it’s difficult to know with assurance because Google has not revealed the search terms that they use for GFT, it seems likely that they’ve done a good job in working around the complications of flu-related search patterns.