Venn diagrams have long been used in teaching online searching, to help users visualize how Boolean searching works. A new application of Venn diagrams, Twitter Venn, gives on-the-fly Venn diagrams of Twitter postings. Because Twitter does such a good job of taking the pulse of the Web, Twitter Venn is an excellent way to visualize connections of breaking news topics. The first Twitter Venn example below shows that there are 5047 postings per day with the word heart, 1314 postings with the word risk and 24 postings that contain both words, represented by the gray “intersection” in the middle containing two purple dots. The fun part — To the lower left other words are listed that occur in the postings on heart and risk — The top word, in large print, is decaf, indicating that there’s current buzz that relates decaf to heart disease risk. Sure enough, a Google search for heart decaf shows that there have indeed been recent reports that decaf coffee may increase the risk of heart disease, at least slightly.

[Click images below for live results in Twitter Venn. The numbers will vary slightly, since they're generated live. On the Twitter Venn results, click the middle intersection area to show common terms in lower left.]

The second example, below, shows clearly that the main source of alarm about salmonella poisoning is peanut butter, since this is the predominant word that occurs with the search words, as shown in the listing at the lower left.

It’s occurred to me for a long time that Venn diagrams are a good way to visualize the relationships among subjects in online searching. But I suspect the sort of on-the-fly, realtime generation of Venn diagrams done by Twitter Venn would be too slow for databases with more text per record. So it’s for Twitter, with its tiny 140-character records, to show how useful Venn diagrams can be for visualization.

Mike Cane hits the target on color eBooks

Truly, the first device that can do color eBooks will change things forever … There are three recent signs — as well as a total wild card — that point to possible dramatic changes in the eBook-reading hardware landscape. …

… The first is Samsung hitting the pedal hard on OLED screen manufacturing.

… The second development has been Hewlett-Packard demonstrating color eInk screens.

… The third piece of this puzzle: Amtek Rumored to Show Slate Netbook at CES 2009.

… The Wild Card in all this? … Pixel Qi, which brags it has revolutionary screens that will basically run on electrons by osmosis instead of the greedy sip-sip-sip of current technology.

Sad to say, this is one of Mike’s last blog postings — His incisive comments on eBooks will be missed.

Andrew Smith, at the Dallas News, writes on the same subject, in article — Why e-books will rule

… Nearly all non-fiction books cry out for far more illustration than they contain, but the costs of adding pictures and charts (especially color pictures and charts) are prohibitive. That’s why you see so many non-fiction books with all the photos bunched up into a couple of glossy-page sections in the center. It’s the only cheap(ish) way to get the job done. Color E-Ink will change that forever. Nearly all non-fiction books cry out for far more illustration than they contain, but the costs of adding pictures and charts (especially color pictures and charts) are prohibitive. That’s why you see so many non-fiction books with all the photos bunched up into a couple of glossy-page sections in the center. It’s the only cheap(ish) way to get the job done. Color E-Ink will change that forever.

Scientific and medical books, which make heavy use of color illustrations, especially stand to benefit from the advent of color eBooks, maybe lowering the prices, which can break a student’s budget for print textbooks.

DjVu

A month ago, Google announced that it has begun putting magazines in Google Books. In one way, this is a new direction for Google. But looked at broadly, it’s really not so new — Google has been putting old journals in Google Books for a long time. The basic difference between the newly announced “Google magazines” and Google’s “old journals,” of course, is the date of publication — The titles that are being treated as “magazines” are generally published in the last 50 years or so. But some of these also include much older issues, in some cases, such as Popular Science, going back to the 1800′s. A bit of digging — searching for words in an article — finds a nice case of a title that’s in Google Books both ways, as a magazine and as an old journal. Snippets from the “About this book” and “About this magazine” pages below show differences.

Old journals – The journal / book format

Old journals are given the same treatment as books, with each volume of the journal being considered a book. The record here is for volume 26 of Popular Science Monthly (the old name of Popular Science).

Old journals are scanned into Google Books by libraries, in the case shown here, Harvard University. As with other books scanned by libraries, the About page has a selection of thumbnail images, giving an idea of what sort of graphics are in the book. Also note the button to Download the entire volume in PDF format.

The Magazine format

In contrast to journal/book format, in which the volume (made up of several issues) is treated as the basic record unit, in magazines, the basic record unit is the issue. This record is for the Feb 1885 issue of Popular Science.

Comparing this with the journal/book format, this lacks thumbnail preview images and it also does not support downloading a PDF of the issue. It does, however, have the great advantage over the journal/book format, that all issues are connected in the Browse all issues menu.

DjVu Google Books is full of surprises!  In surveying medical journals in Google Books, I discovered that volumes of British Medical Journal circa 1880 scanned at Harvard have extensive sections devoted to advertisements. Most libraries, when they bind issues of journals and magazines into bound volumes, very reasonably remove pages that have only advertisements, to save space on the shelf. So it’s good to have a Harvard, that can afford to save the rare gems of 19th century ads, so that they can be put online for the world to enjoy!

As fanciful at the ad shown here is (“Ask for Cadbury’s Pure Cocoa, makers to the Queen”), there is a wealth of more prosaic ads in the same volume, awaiting future medical historians, on subjects such as malted infant food, lactopeptine for indigestion, bronchitis & croup kettles, and state-of-the-art wheelchairs.

I found several other journals in Google Books from the same late-19th-century era, that also have extensive ads. But British Medical Journal is the only one I found that has entire, separate volumes of advertising. Apparently there must have been separate supplements that were only ads (this was in the dawn of the age of mass advertising, and people, even including physicians, were actually GLAD to read ads!)

So, how searchable are the ads in Google Books? I tried a few examples and had mixed results — Searching for this phrase that’s in the Cadbury’s ad — “why does my doctor recommend Cadbury’s Cocoa” — was successful. But searching for a phrase in the ad that follows the Cadbury’s ad, for Anodyne Amyl Colloid — “in cases of neuralgia, sciatica, lumbago” — found the phrase in other ads for the same product, but not the one occurring in this instance.

Here are volumes of British Medical Journal that I found that are exclusively advertising (All of these were scanned at Harvard):

The list presented here has FULL VIEW (public domain, pre-1923) journals in Google Books. This is certainly NOT intended to be a complete list! There’s no easy way that I have found to limit a search in Google Books to journals, so I have found these titles by searching for appropriate words such as medical, dermatology, journal, archive, transactions. I have not included titles that have less than 5 volumes in Google Books. Unfortunately, there’s no way that I have found to sort the title searches chronologically, so to find a particular volume, it’s necessary to go through the results list. Each entry in the list below has links to the first and last volumes that I have found for each title; these dates are not necessarily inclusive. For “contributing libraries,” examples are given if there is more than one contributing library.

This list grew a lot longer than I thought it would — I was surprised to find so many journals in Google Books! It was a tedious job compiling this, and I probably won’t try to keep it current, with new volumes being added all the time. If I get feedback :-) I’m more likely to put in more work on it, so please add a comment, or mail me at: eric[hyphen]rumsey AT uiowa[dot]edu

Until now, books with pictures, especially color pictures, have been a relatively small part of Google Books. But the addition of highly visual, popular magazines changes this — The titles added so far are filled with pictures!

On one level, more pictures in Google Books is gratifying — a theme of this blog! But the navigation/search capabilities for finding these pictures is limited. The best way seems to be to use Advanced Search and limit the search to Magazines. But the results listing for this is text-only. It would be much easier to search for pictures with the sort of thumbnail search results interface that’s used in Google Image Search.

In light of the launching of picture-laden magazines as part of Google Books, it’s interesting to note that only last month, Google launched Life magazine pictures, as part of Google Image Search. Google is facing the same choice that librarians have been considering for the last while — Should books (or magazines) that have many pictures be considered mainly as books that happen to have pictures, or as pictures that happen to be in books?

The pictures & links below are from magazines that are in Google Books. I’ve chosen them because I know from work on Hardin MD that they are on highly-searched subjects, which would likely appear in Google Image Search if they were crawlable.

.           .

Google Books - Magazines

When I started this list in Dec, 2008, Google did not provide a list of their own — Thankfully, they provided one in Nov, 2009 (their announcement is Here, their list is Here). Assuming they keep up their list, I will probably not add to the list provided here. Comparing their list with mine now (11/12/09), they have everything on my list except one title (Log home living). Good start, Google, Hope you keep it up :-)

Please note: the dates given for titles is not necessarily inclusive! Some are quite spotty.

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

A New York Times article on Google Flu Trends reports that Google’s methodology “has been validated by an unrelated study” based on Yahoo! search data whose lead author is Philip Polgreen, an infectious disease doctor at the University of Iowa.

I was glad to learn about the Polgreen study, first, of course, because Polgreen and colleagues are right here at the University of Iowa! — But beyond that, it was good to find in the full article by the Polgreen team that they give more details about the flu-related search terms they used than the Google Flu Trends team does, making it easier to break down the complicating factors in flu searching. Specifically, they report that they excluded the following terms:

bird, avian, pandemic
vaccine, vaccination, shot

As discussed in accompanying articles (see below), flu is a particularly complicated disease for correlating disease occurrence and web search behavior, because of the existence of bird flu, and because there is a vaccine for flu — exactly the factors that have been excluded by Polgreen et al. It seems likely that the Google Flu Trends team is using a similar method.

Incidentally, more on the Iowa connection — Philip Polgreen has been involved for several years with the Iowa Health Prediction Market, a spin-off of the Iowa Electronic Markets, a real-money prediction market/futures market that’s used to make predictions in political elections.

** This is one of a group of three articles on Google Flu Trends:

Together, these articles suggest that, although it’s difficult to know with assurance because Google has not revealed the search terms that they use for GFT, it seems likely that they’ve done a good job in working around the complications of flu-related search patterns.

The CDC data above shows that the occurrence of flu generally peaks in February; the data below from Google insights : flu symptoms, not surprisingly, has a similar peak in February.

Google insights, which uses the same data as Google Flu Trends, shows quite a different pattern for flu shot (below), which peaks in October or November (Flu vaccine peaks similarly).

How about searching for just the word flu (below)? — Interestingly, this seems to combine the peaks in the two graphics above, for flu symptoms and flu shot. The exaggerated peaks in 2004 and 2005 likely are caused by peoples’ concerns about vaccine shortage (more on this in accompanying posting, Google Flu Trends: Kudos & Complications).

Looking at the evidence of these graphics from Google Insights, it seems likely that the Google Flu Trends team is excluding search terms relating to flu vaccine, and concentrating on terms that relate to symptoms. See confirmation of this in accompanying posting, Google Flu Trends: The Iowa Connection.

The data shown here seems to indicate that for a seasonal disease in which there is a vaccine, the search patterns for “disease: symptoms,” “disease: vaccine/shot,” and the disease term itself differ, correlated with the time in the year when the disease occurs and when the vaccine is given. This idea is confirmed by Google Insights data for pneumonia, another respiratory disease that has a vaccine. The patterns are similar to flu, with high peaks for pneumonia shot in October, and somewhat lower peaks for pneumonia and pneumonia symptoms in February.

Bronchitis — A disease with no vaccine

Bronchitis is a respiratory disease condition that does not have a vaccine. As the graphics from Google Insights below show, the pattern is different from flu and pneumonia — The peaks for the disease itself (bronchitis, below) and for the disease with symptoms is much the same, making it less complicated to track search patterns — Apparently the people who search for the disease are in fact people who have the disease.

Bronchitis symptoms, from Google Insights.

** This is one of a group of three articles on Google Flu Trends:

Together, these articles suggest that, although it’s difficult to know with assurance because Google has not revealed the search terms that they use for GFT, it seems likely that they’ve done a good job in working around the complications of flu-related search patterns.

Google Flu trends is an elegant application of search data to medicine. Working on Hardin MD, I’ve long noticed seasonal variations in certain diseases — Colds, flu, & respiratory illnesses peak in winter, and insect bites & sun exposure conditions peak in summer. I pay a lot of attention to the search terms that people use to get to Hardin MD pages, so Google’s mining of this data to serve the health of the community is especially interesting.

The idea of Google Flu Trends is shown nicely in the snapshot above from the animation at the Google Flu Trends site — Google finds that there is an excellent correlation between flu-related terms that people search and the occurrence of flu, as measured by CDC data. And, as shown in the animation, the Google search data in near real-time precedes CDC data, which takes 1-2 weeks to be reported and compiled.

Complications

The idea of using search data to track the progression of disease outbreaks certainly is elegant, and Google deserves congratulations for it. In choosing flu as the first example, however, Google has chosen a disease with complicating factors.

The nature of these potentially complicating factors is suggested in the graphic above from the Google Flu Trends site — A big question here is — What caused the spike in flu occurrence and flu search activity in Dec 2003 – Jan 2004?

Because Google has chosen not to reveal the exact search terms that they are using to determine the volume of searching for flu-related searching (see supplementary material accompanying Google’s paper in Nature), it’s difficult to know the cause of the 03-04 spike with certainty. But looking back at the chronology of that time period sheds light — There was a major shortage of the flu vaccine in late 2004, which is certainly related to the spike shown in the graphic — The CDC spike (yellow) shows that many people had flu, presumably because they were unable to get the vaccine. The Google spike (blue) is even higher, which may indicate that there were a significant number of people searching for flu information not because they were infected, but because they were looking for information on how to get the vaccine. The accompanying article (Flu Symptoms vs Flu Shot) shows that there is in fact a clear indication of heightened search activity for flu vaccine-related terms during the autumn pre-flu season.

The other complicating factor in looking at flu-related search activity is bird flu, and this seems to have been addressed well by Google — The large bird flu outbreak in Asia, and corresponding bird flu scare throughout the world, occurred in late 2004 and early 2005. Since there is no major spike shown in the graphs for this time, Google apparently has excluded bird flu/avian flu search terms from the aggregate group of terms it’s using.

** This is one of a group of three articles on Google Flu Trends:

Together, these articles suggest that, although it’s difficult to know with assurance because Google has not revealed the search terms that they use for GFT, it seems likely that they’ve done a good job in working around the complications of flu-related search patterns.