Poking around in Google Similar Images, I’ve found examples that give indications of how the system works. I’ve put several of these together in a Flickr set, from which the example below is taken.

The top image in each of the pairs below (“Full size image”) is a choice from the initial search in GSI (“blackbird” in the example below). Clicking “Similar images” for this choice goes to a set of refined images, represented by the bottom row of images in the pair. The blackbird example here shows some of the strengths and weaknesses of GSI. It often seems to do best with color photographs, but not so well with monocolor pictures. In the first instance, the red spot on the wing and the greenish background likely are clues used by GSI, to good effect. The lack of color clues in the second case is likely a problem for GSI. It also shows pretty clearly that GSI is getting clues from words associated with images, in this case causing it to confuse the blackbird with the US Air Force plane that has the same name.

The importance of color clues for GSI that’s shown in the example above occurs in several additional examples in the Flickr set — B/W line drawings especially cause problems for GSI. Here are some other observations from the Flickr examples:

  • One notable example shows how GSI has a tendency to give too much weight to a word associated with a picture, as in the blackbird example — In a search for “george“, the “similar images” for a non-famous person named George are dominated by pictures of the recently prominent George Bush!
  • GSI does best when the image is focused clearly on one subject; it doesn’t do well when there are multiple subjects, especially when they are unusual subject combinations, that don’t typically occur together.
  • It does poorly with abstract or stylized, non-realistic images.
  • Strongly featured text sometimes “pulls the attention” of GSI away from the “picture content” of the image.

Despite the problems described here, I think GSI is a true advance in the technology of image search. In general, it does a surprisingly good job of detecting similarity. So, kudos to Google engineers!

Yale Image Finder is a search engine for searching medical articles in PubMed Central for images. YIF is notable because it searches for text that is contained in images, many of which are charts and graphs with embedded “text” describing the data being presented. The “text” in these images, as in the example from YIF below, is converted to searchable OCR text.

What especially strikes me about this project is how similar it is to several initiatives from Google — For several years, Google has been working on image-to-text conversion in various of its facets, starting with Google Catalogs (now defunct) and Google Book Search. More recently, in 2008, several patents were published which extend the potential use of this sort of technology to a variety of possibilities, some of which include use in Google Maps street view, labels in museums and stores, and YouTube videos. Also showing Google’s continuing interest in this area is the announcement in Oct, 2008 that scanned PDF documents in Google Web Search are being converted to OCR text format.

Yale Image Finder was first announced in August, 2008, so it’s surprising that I have not been able to find anywhere (including a scholarly description by the developers) that it’s been connected to the initiatives by Google, which seem to be so similar. The same sorts of expressions of awe and amazement that have been expressed about the Google initiatives apply equally well to the Yale project, so I’m excerpting several of these commentaries below, all written in January, 2008, when the latest patents from Google inventors Luc Vincent and Adrian Ulges were published …

Bill Slawski, who has written several articles on Google image-to-text patents – Google on Reading Text in Images from Street Views, Store Shelves, and Museum Interiors :

One of the standard rules of search engine optimization that’s been around for a long time is that “search engines cannot read text that is placed within images.” What if that changed?

Here’s more from Slawski – Googlebot In Aisle Three: How Google Plans To Index The World? :

It’s been an old sawhorse for years that Google couldn’t recognize text that was displayed in images while indexing pages on the Web. These patent filings hint that Google may be able to do much more with images than we can imagine.

Duncan RileyGoogle Lodges Patent For Reading Text In Images And Video :

I may be stating the blatantly obvious when I say that if Google has found a way to index text in static images and video this is a great leap forward in the progression of search technology. This will make every book in the Google Books database really searchable, with the next step being YouTube, Flickr (or Picasa Web) and more. The search capabilities of the future just became seriously advanced.

Of course — sorry to keep harping on it! — as much as recognizing text in pictures would be a great advance, the REAL advance, of recognizing the actual objects in pictures, the philosopher’s stone of image search, still seems far from happening.

Please comment here or Twitter @ericrumsey

Adam Hodgkin, in Google Pictures and Google Books, wonders why Google has chosen to put Prado paintings in Google Earth rather than in Google Images. In December I asked a similar question about Google’s putting Life Magazine pictures in Google Images, but putting other picture-laden magazines in Google Books. And, in another recent launch they’ve put newspapers, which also have many pictures, in Google News.

Once again I come back to the theme of this blog — Pictures are just different — They don’t fit neatly into our categories. Pictures are an important part of several different media — books, magazines, newspapers, and (of course) art — So what slot do we put them in?

Even before the recent questions arose with Life Magazine pictures, Google Magazines, Google Newspapers, and Prado paintings, there’s the ongoing, but little-noted question of pictures in the growing collection of public domain books in Google Books. In my experience, these are completely absent from Google Image Search — When will Google make this connection?

Figuring out what category to put them into, of course, is a relatively minor problem compared to the BIG PROBLEM with pictures, which is making them searchable! If there was one category to put them into that was searchable, then of course that would be the place for Google to put them!

Until now, books with pictures, especially color pictures, have been a relatively small part of Google Books. But the addition of highly visual, popular magazines changes this — The titles added so far are filled with pictures!

On one level, more pictures in Google Books is gratifying — a theme of this blog! But the navigation/search capabilities for finding these pictures is limited. The best way seems to be to use Advanced Search and limit the search to Magazines. But the results listing for this is text-only. It would be much easier to search for pictures with the sort of thumbnail search results interface that’s used in Google Image Search.

In light of the launching of picture-laden magazines as part of Google Books, it’s interesting to note that only last month, Google launched Life magazine pictures, as part of Google Image Search. Google is facing the same choice that librarians have been considering for the last while — Should books (or magazines) that have many pictures be considered mainly as books that happen to have pictures, or as pictures that happen to be in books?

The pictures & links below are from magazines that are in Google Books. I’ve chosen them because I know from work on Hardin MD that they are on highly-searched subjects, which would likely appear in Google Image Search if they were crawlable.

.           .

As computers have become more powerful, many of the aspects of handling text that were formerly done by humans have been taken over by computers. Pictures, however, are much more difficult to automate — Recognizing patterns remains a task that humans do much better than computers. A human infant can easily tell the difference between a cat and a dog, but it’s difficult to train a computer to do this.

In pre-Google days, the task of finding good lists of web links needed the input of smart humans (and Hardin MD was on the cutting edge in doing this). Now, though, Google Web Search gives us all the lists we need.

Pictures are another story — on many levels, pictures require much more human input than text.

The basic, intractable problem with finding pictures is that they have no innate “handle” allowing them to be found. Text serves as its own handle, so it’s easy for Google Web Search to find it. But Google Image Search has a much more difficult task. It still has to rely on some sort of text handle that’s associated with a picture to find it, and is at loss to find pictures not associated with text.

The explosive growth of Hardin MD since 2001 (page views in 2008 are over 50 times larger) has been strongly correlated with the addition of pictures. This time period has also gone along with the growing presence of Google, with its page-rank technology, and this has come to make old-style list-keeping, as had been featured in Hardin MD, less important.

Though Google has accomplished much in the retrieval of text-based pages, it’s made little progress in making pictures more accessible. Google Image Search is the second most-used Google service, but its basic approach has changed little over the years.

The basic problem for image search is that pictures don’t have a natural handle to search for. Because of this it takes much more computer power for the Google spider to find new pictures, and consequently it takes much longer for them to be spidered, compared to text pages (measured in months instead of days).

Beyond the problem of identifying pictures there are other difficult-to-automate problems for image search:
• How to display search results most efficiently to help the user find the what they want — Do you rank results according to picture size, number of related pictures at a site, or some other, more subjective measure of quality?
• What’s the best way to display thumbnail images in search results?
• How much weight should be given to pictures that have associated text that helps interpret the picture?

So — Good news for picture people! — I would suggest that pictures are a growth sector of the information industry, and a human-intensive one. I would predict that text-based librarians will continue to be replaced, as computers become more prominent. But there will continue to be a need for human intelligence working in all areas relating to pictures, from indexing/tagging to designing systems to make them more accessible.