[When Hardin MD was launched in 1996, its main purpose was to provide links to health science resources on the Web. In recent years, the emphasis has been on providing access to medical pictures.]

We first started tracking how well Google was finding Hardin MD pages in about 2001, when search engine optimization was in its infancy, and most people, like us, had not heard the term “SEO.” But in today’s lingo, that’s pretty much what we were doing — Learning to use language that would help people searching in Google to find our pages — So here’s a little example of using search engine optimization techniques before they became famous as SEO. …

Users of Hardin MD will notice that the word “pictures” is used frequently on our pages and the word “images” is rarely used. Why is this? Basically, the answer is simple — We use “pictures” because that’s the word people use in searching.

The screen-shots below, for the Hardin MD : Impetigo Pictures page, show this clearly. The Extreme Tracker shot for this page shows the large proportion of search engine traffic from the word “pictures” (36%) compared to the small amount of traffic from the word “images” (0.6%).

hmd_impetigopics.JPG
extremeimpetigo.jpg
Hardin MD : Impetigo Pictures page
Keywords (Extreme Tracker)

The Google screen-shots show that the Impetigo Pictures page gets an equally high ranking for the two words, so it’s apparent that “pictures” is being searched much more frequently.

g_pictures.jpg
g_images.jpg
Google search: impetigo pictures
Google search: impetigo images

(Note that these screen-shots have been photo-edited to fit the space — Ads and other text not relevant to the article have been removed. All screen-shots captured in July 2008.)

Here’s the background …

In about 2001, we started noticing how people were finding Hardin MD pages in search engines, and designing our pages to make them more likely to be found. An important part of this was using words that people were more likely to search (e.g. “heart diseases” instead of “cardiology”). Tools such as WordTracker that show how many people are searching for particular words are especially useful for this.

About this same time, we were starting to make links to other sites that have pictures on medical/disease subjects. Using WordTracker, and ExtremeTracker (to see words people were searching to find our pages) it was striking that the word “pictures” was very effective. At the time, we assumed that the appropriate word to use was “images,” since that word is what’s used on most medical/disease pages at other sites. We could see clearly, however, that using the word “pictures” on our pages brought much more traffic than the word “images.” So we’ve gone on from there, and now have high rankings in Google for many medical/disease subjects combined with “pictures,” as with Impetigo.

Extreme Tracker | WordTracker

As computers have become more powerful, many of the aspects of handling text that were formerly done by humans have been taken over by computers. Pictures, however, are much more difficult to automate — Recognizing patterns remains a task that humans do much better than computers. A human infant can easily tell the difference between a cat and a dog, but it’s difficult to train a computer to do this.

In pre-Google days, the task of finding good lists of web links needed the input of smart humans (and Hardin MD was on the cutting edge in doing this). Now, though, Google Web Search gives us all the lists we need.

Pictures are another story — on many levels, pictures require much more human input than text.

The basic, intractable problem with finding pictures is that they have no innate “handle” allowing them to be found. Text serves as its own handle, so it’s easy for Google Web Search to find it. But Google Image Search has a much more difficult task. It still has to rely on some sort of text handle that’s associated with a picture to find it, and is at loss to find pictures not associated with text.

The explosive growth of Hardin MD since 2001 (page views in 2008 are over 50 times larger) has been strongly correlated with the addition of pictures. This time period has also gone along with the growing presence of Google, with its page-rank technology, and this has come to make old-style list-keeping, as had been featured in Hardin MD, less important.

Though Google has accomplished much in the retrieval of text-based pages, it’s made little progress in making pictures more accessible. Google Image Search is the second most-used Google service, but its basic approach has changed little over the years.

The basic problem for image search is that pictures don’t have a natural handle to search for. Because of this it takes much more computer power for the Google spider to find new pictures, and consequently it takes much longer for them to be spidered, compared to text pages (measured in months instead of days).

Beyond the problem of identifying pictures there are other difficult-to-automate problems for image search:
• How to display search results most efficiently to help the user find the what they want — Do you rank results according to picture size, number of related pictures at a site, or some other, more subjective measure of quality?
• What’s the best way to display thumbnail images in search results?
• How much weight should be given to pictures that have associated text that helps interpret the picture?

So — Good news for picture people! — I would suggest that pictures are a growth sector of the information industry, and a human-intensive one. I would predict that text-based librarians will continue to be replaced, as computers become more prominent. But there will continue to be a need for human intelligence working in all areas relating to pictures, from indexing/tagging to designing systems to make them more accessible.