Poking around in Google Similar Images, I’ve found examples that give indications of how the system works. I’ve put several of these together in a Flickr set, from which the example below is taken.
The top image in each of the pairs below (“Full size image”) is a choice from the initial search in GSI (“blackbird” in the example below). Clicking “Similar images” for this choice goes to a set of refined images, represented by the bottom row of images in the pair. The blackbird example here shows some of the strengths and weaknesses of GSI. It often seems to do best with color photographs, but not so well with monocolor pictures. In the first instance, the red spot on the wing and the greenish background likely are clues used by GSI, to good effect. The lack of color clues in the second case is likely a problem for GSI. It also shows pretty clearly that GSI is getting clues from words associated with images, in this case causing it to confuse the blackbird with the US Air Force plane that has the same name.
The importance of color clues for GSI that’s shown in the example above occurs in several additional examples in the Flickr set — B/W line drawings especially cause problems for GSI. Here are some other observations from the Flickr examples:
- One notable example shows how GSI has a tendency to give too much weight to a word associated with a picture, as in the blackbird example — In a search for “george“, the “similar images” for a non-famous person named George are dominated by pictures of the recently prominent George Bush!
- GSI does best when the image is focused clearly on one subject; it doesn’t do well when there are multiple subjects, especially when they are unusual subject combinations, that don’t typically occur together.
- It does poorly with abstract or stylized, non-realistic images.
- Strongly featured text sometimes “pulls the attention” of GSI away from the “picture content” of the image.
Despite the problems described here, I think GSI is a true advance in the technology of image search. In general, it does a surprisingly good job of detecting similarity. So, kudos to Google engineers!
Great reverse engineering / explanation, thanks!
Thank you for this excellent post. You’ve done some valuable investigating here. Did you try submitting an exemplar image without “blackbird” in the name of the image? You should rename the original image to something else (like an MD5 hash) and repeat the experiment.
The Google technology strikes me as unsophisticated. It almost has to be since all of the really good ideas are patented already. ;^)
The seminal work in this area, IMHO, was done by Barnsley and Sloan, of Iterated Systems fame:
http://bit.ly/5jgUD
The Barnsley technique decomposes an image into a set of fractal primitives. The primitives embody invariants that can be compared across other sets of such primitives. By matching set to set, images can be matched. It’s a subtle and remarkable technique, the theoretical basis of which is well described in Ning Lu’s “Fractal Imaging” (out of print but well worth hunting down).
The Iterated Systems technology eventually became part of the Mediabin DAM system, which was acquired by Interwoven.
Prior to the advent of fractal techniques, this type of thing was done by means of neural-net technology (AI), first subjecting images to various kinds of canonicalization and invariant-extraction. The motivations and approaches are well captured in Timothy Masters’ books on AI (again well worth hunting down). I also recommend Parker’s “Algorithms for Image Processing and Computer Vision” (Wiley, 1997).
Whatever Google is doing, it’s not magic. The magical stuff was invented 10 and 20 years ago. :^)
Pingback: Twitted by ericrumseytemp
Pingback: Links Dumper #1
“It also shows pretty clearly that GSI is getting clues from words associated with images”
That means they name each and every picture? Isn’t that.. Too much work?