Poking around in Google Similar Images, I’ve found examples that give indications of how the system works. I’ve put several of these together in a Flickr set, from which the example below is taken.

The top image in each of the pairs below (“Full size image”) is a choice from the initial search in GSI (“blackbird” in the example below). Clicking “Similar images” for this choice goes to a set of refined images, represented by the bottom row of images in the pair. The blackbird example here shows some of the strengths and weaknesses of GSI. It often seems to do best with color photographs, but not so well with monocolor pictures. In the first instance, the red spot on the wing and the greenish background likely are clues used by GSI, to good effect. The lack of color clues in the second case is likely a problem for GSI. It also shows pretty clearly that GSI is getting clues from words associated with images, in this case causing it to confuse the blackbird with the US Air Force plane that has the same name.

The importance of color clues for GSI that’s shown in the example above occurs in several additional examples in the Flickr set — B/W line drawings especially cause problems for GSI. Here are some other observations from the Flickr examples:

  • One notable example shows how GSI has a tendency to give too much weight to a word associated with a picture, as in the blackbird example — In a search for “george“, the “similar images” for a non-famous person named George are dominated by pictures of the recently prominent George Bush!
  • GSI does best when the image is focused clearly on one subject; it doesn’t do well when there are multiple subjects, especially when they are unusual subject combinations, that don’t typically occur together.
  • It does poorly with abstract or stylized, non-realistic images.
  • Strongly featured text sometimes “pulls the attention” of GSI away from the “picture content” of the image.

Despite the problems described here, I think GSI is a true advance in the technology of image search. In general, it does a surprisingly good job of detecting similarity. So, kudos to Google engineers!

In Dec, 2008, Google announced that they had begun adding recent popular magazines to Google Book Search. Because Google, inexplicably, chose not to provide a list of titles that were included, I made a list of about 40 titles, and until recently I hadn’t added to it, assuming that Google hadn’t added any more titles, since none had appeared on the Google Book Search home page. Recently, though, I saw in Twitter that people were mentioning new titles, so I did some searching to see if I could find more. And indeed, I did find about 10 new titles that have apparently been added recently, and I’ve added these to the list at Google Magazines – Titles.

A suggestion: If you find an interesting new magazine title in Google Book Search, put it in Twitter, and include the hashtag that I just created, #gbsmag (Clicking this will retrieve tweets in Twitter Search, with examples from the new titles I recently found). If you don’t use Twitter, of course, feel free to put new magazine titles in a comment to this article.

Google CEO Eric Schmidt’s comments on health/medicine in a recent wide-ranging interview by Charlie Rose have not gotten much attention, so I’m excerpting them here. First, Schmidt discusses Google Flu Trends:

[For clarity I've mixed a few words from Rose's questions with Schmidt's comments]
There are many [positive] things that we can do with the corpus of information that’s being gathered … The most interesting thing we’ve recently done is called flu trends. We looked at … trends in terms of worldwide flu … There’s a lot of evidence, concern about a pandemic … that might occur, similar to the 1918 bird flu epidemic that killed … 50 million … a proportionately huge number if it were today. And because people, when they have a problem, search for something, we can detect uncommon searches as a blip. We can note that. In our case, we built a system which took anonymized searches so you couldn’t figure out exactly who it was, and that’s important. And we get six months ahead of the other reporting mechanisms so we could identify the outbreak. Many people believe that this device can save 10, 20, 30,000 lives every year just because the healthcare providers could get earlier and contain the outbreak. It’s an example of collective intelligence of which will are [sic] many, many more.

Later in the interview, Schmidt talks about what he calls a “public corpus of medical information”:

The Wikipedia model has been so successful. Why don’t we have all the smartest doctors organize a corpus, a public corpus of medical information … that combines everything everybody knows about medical practice in one place, a place where you can — again, this would have to be a public database where you keep pouring more experiential data, and then you can build computer systems … [Rose: So you have all your cases, everything you ever knew] Schmidt: Again, anonymized so it’s appropriately legal and all of that, and get it in one place so that people can begin to mine the data. They can actually begin to figure out what the disease trends are. What are the real health trends? And this is not a knock on the existing providers to do it. They just don’t have the scale. We are strong when we have thousands of people working in parallel to solve a really important problem. I would tell you, by the way, that if you look at the problems that society has hit over the last thousand years, start with the plague, right all of the things that really hit us that nearly destroyed society, we overcame them through technology and innovation. People figured out new ways whether it was in medicine or governance to overcome them. So let’s be positive about it. We can work those issues. There’s always a way to handle the objections if it’s important.

Yale Image Finder is a search engine for searching medical articles in PubMed Central for images. YIF is notable because it searches for text that is contained in images, many of which are charts and graphs with embedded “text” describing the data being presented. The “text” in these images, as in the example from YIF below, is converted to searchable OCR text.

What especially strikes me about this project is how similar it is to several initiatives from Google — For several years, Google has been working on image-to-text conversion in various of its facets, starting with Google Catalogs (now defunct) and Google Book Search. More recently, in 2008, several patents were published which extend the potential use of this sort of technology to a variety of possibilities, some of which include use in Google Maps street view, labels in museums and stores, and YouTube videos. Also showing Google’s continuing interest in this area is the announcement in Oct, 2008 that scanned PDF documents in Google Web Search are being converted to OCR text format.

Yale Image Finder was first announced in August, 2008, so it’s surprising that I have not been able to find anywhere (including a scholarly description by the developers) that it’s been connected to the initiatives by Google, which seem to be so similar. The same sorts of expressions of awe and amazement that have been expressed about the Google initiatives apply equally well to the Yale project, so I’m excerpting several of these commentaries below, all written in January, 2008, when the latest patents from Google inventors Luc Vincent and Adrian Ulges were published …

Bill Slawski, who has written several articles on Google image-to-text patents – Google on Reading Text in Images from Street Views, Store Shelves, and Museum Interiors :

One of the standard rules of search engine optimization that’s been around for a long time is that “search engines cannot read text that is placed within images.” What if that changed?

Here’s more from Slawski – Googlebot In Aisle Three: How Google Plans To Index The World? :

It’s been an old sawhorse for years that Google couldn’t recognize text that was displayed in images while indexing pages on the Web. These patent filings hint that Google may be able to do much more with images than we can imagine.

Duncan RileyGoogle Lodges Patent For Reading Text In Images And Video :

I may be stating the blatantly obvious when I say that if Google has found a way to index text in static images and video this is a great leap forward in the progression of search technology. This will make every book in the Google Books database really searchable, with the next step being YouTube, Flickr (or Picasa Web) and more. The search capabilities of the future just became seriously advanced.

Of course — sorry to keep harping on it! — as much as recognizing text in pictures would be a great advance, the REAL advance, of recognizing the actual objects in pictures, the philosopher’s stone of image search, still seems far from happening.

Please comment here or Twitter @ericrumsey

When a link is clicked to a specific page in GBS Mobile, the page that always opens is the entry page for the book. There doesn’t seem to be a way to link successfully to specific pages. I’ve tried this in several examples, and have had the same experience in all of them. An example below illustrates.

In this example, I’m trying to link to a group of pages starting with page 31. But when the link below is clicked, it goes to the entry page, which is page 21, with the same URL as below except that the page number is 21 instead of 31.

[This link and the link in the image below are the same]

After this link is clicked, and it goes to page 21, then it does work to change the number from 21 to 31, and it goes to page 31. The right > next to “Pages 21-30″ also works.

When the link is clicked to go to page 31, and ends up on page 21, clicking the Back button goes to page 31. And the address bar initially initially reads 31, but then changes to 21 – So when the link is initially clicked, it does “pass through” page 31, but apparently there’s some signal on page 31 that tells it to redirect to page 21.

Does anyone see what’s happening here? Any help would be much appreciated! Please post suggestions in comments, or in Twitter.






Adam Hodgkin, in Google Pictures and Google Books, wonders why Google has chosen to put Prado paintings in Google Earth rather than in Google Images. In December I asked a similar question about Google’s putting Life Magazine pictures in Google Images, but putting other picture-laden magazines in Google Books. And, in another recent launch they’ve put newspapers, which also have many pictures, in Google News.

Once again I come back to the theme of this blog — Pictures are just different — They don’t fit neatly into our categories. Pictures are an important part of several different media — books, magazines, newspapers, and (of course) art — So what slot do we put them in?

Even before the recent questions arose with Life Magazine pictures, Google Magazines, Google Newspapers, and Prado paintings, there’s the ongoing, but little-noted question of pictures in the growing collection of public domain books in Google Books. In my experience, these are completely absent from Google Image Search — When will Google make this connection?

Figuring out what category to put them into, of course, is a relatively minor problem compared to the BIG PROBLEM with pictures, which is making them searchable! If there was one category to put them into that was searchable, then of course that would be the place for Google to put them!

Google recently announced that scanned PDF documents are now available in Google Web Search. PDF documents have been in Google before, but most PDF documents that have been scanned from paper documents have not, so this will greatly improve access to PDF’s. As described below, it’s important to be able to distinguish scanned PDF’s from others, of the sort that have been in Google before.

Scanned PDF documents are originally created by making an image scan of a paper document, and since the text is an image, it’s not selectable or searchable as text. The other kind of PDF document, usually called native PDF, that’s been in Google before, is originally created from an existing electronic formatted document, like a Word document, and its text is selectable and searchable as text.

From Google search results it’s not possible to determine  whether a PDF document is a scanned document or a native document — Both simply say “File Format: PDF/Adobe Acrobat.” To see if it’s scanned or native PDF, go to the document and click on a word to see if it can be selected. If it can, it’s native PDF; if not it’s scanned PDF. It’s important to know this because in a scanned PDF, the text is not searchable within the PDF-browser reader. This is not readily apparent, because the search command seems to work, but comes up with zero results. To search the text of a scanned document, go to search results, and click “View as HTML,” which has the text of the document.

Examples from Google:
Google search : Scanned PDF – Text cannot be selected (Notice that the text in this document is scratchy, poor quality, another indication of scanned text).
Google search : Native PDF – Text can be selected

See also: Google Books and Scanned PDF’s

For more:

Maps and newspapers, because they’re rich in graphic information, benefit greatly from a zooming and panning interface. Text-only books, because they’re more linear and because text is easily searchable, don’t benefit from this sort of interface as much, but books with pictures certainly do.

zKimmer.com has recently implemented Google Maps technology for viewing non-map text and picture resources, such as magazines and newspapers, which are converted from PDF format. This is an exciting development especially because it holds promise that the same sort of technology could also be used for books.

With Google’s great success using a zooming-panning interface in Google Maps, and having recently launched Google Newspapers which also uses it, the question naturally occurs — Will Google developers sooner or later also use it for Google Books?

The zKimmer screen-shots above are from a magazine (though they could easily be from a book) and those below are from a newspaper. They both show how this interface facilitates navigating a resource that includes extensive pictures as well as text.

zKimmer lacks a good search capability (it has a search box, but it doesn’t seem to work) — So it’s not ready for heavy-duty enterprise use — It’s exciting, though, because it shows the potential value of a zooming-panning interface for books. Google Books already uses panning and zooming in a limited way, for navigating between pages, but a multi-page pan and zoom, as in zKimmer, would greatly simplify picture and text navigation.

Other implementations of the Google Maps API for non-map graphic resources are a desktop collection of elegant books by the reclusive German techno-artist Markus Dressen, and a card set from the World Of Warcraft.

Google recently announced the launch of Google Newspapers. The first issue (and apparently the only one up currently) is the 1969 We’re on the moon edition of the Pittsburgh Post-Gazette.

What caught my attention here is the ability to pan — to move around on the large newspaper page with the mouse by dragging the hand pointer. Use of mouse panning was introduced with Google Maps, and likely played a large part in its becoming so popular. Like a map, a newspaper page poses the same kind of challenge — How to design user navigation for information covering a large surface. As with Google Maps, here also it looks like Google Newspapers has set the standard for navigation of a large-paged information source, especially one with pictures.

Panning (and zooming, which is often discussed together with it) provide an interesting and challenging concept to search, because the words are in prominent use in other contexts, especially photography and video. Surprisingly, there’s no article in Wikipedia for the concept of panning as used for computer information navigation.

An elegant demonstration of panning is at the Hubble pan and zoom gallery.

Google: “pan around” google newspapers

Why an article about a children’s book site? When I first came across the International Children’s Digital Library (ICDL), it immediately struck me as being visually elegant, but could I justify putting it on an academic blog site? The more I thought about it, though, the more it seems very much on target — The theme of this blog is the digitization of pictures, including especially pictures in books. Another theme is that in mass digitization projects, the main concern seems to be text, and that pictures are often overlooked. So, yes, ICDL, with its elegant presentation of pictures and text, is right on target. … And then, of course, finding ICDL in Google as a prime example of a “digital library” seals the deal!

ICDL has many excellent features as a children’s book site e.g. its novel ways to find books, by color, theme, etc and its inclusion of books in a wealth of languages. The aspect of ICDL that I’ll highlight briefly here though, that can serve as a model for any site with illustrated books, is its polished delivery of text and pictures, featured especially in the Book Overview screen, shown below.

Book Overview: Calling the doves = El canto de las palomas

The Mouse-over Preview, that shows an enlarged version of a the thumbnail as the user holds the mouse pointer over it, makes this screen especially effective. To see the nice touches at work here, try changing the window size — As the window is made smaller, the thumbnails also become smaller, so that all of them remain visible. And, even better, the mouse-over preview window does NOT shrink, keeping the same size no matter how small the thumbnails become.

Though ICDL lacks some features of a full-fledged enterprise book-viewing system (text is not available as text), its innovative presentation of book pages serves to show how far existing systems have to go in presenting books with pictures — There’s just no substitute for displaying small versions of the book’s pages that show the pictures and how they relate to the text, and ICDL is a model of how to do this.

ICDL has its roots at the University of Maryland; it’s now run by the ICDL Foundation. It’s written in Java. For more technical details, see paper by ICDL authors.