Flickr takes the sun out of the sunset“Flickr takes the sun out of the sunset” — The picture to the left from Flickr shows the full picture and its square thumbnail, in the inset. Thumbnails like these are generated automatically by Flickr and other photo management systems. They work by taking a portion from the center to make the thumbnail. This works well if the center has the most important subject in the picture. But if the picture is relatively wide or tall, and its main subject is not in the center, as in the example at left, with the sun being to one side, the thumbnail misses it. Looking at this example (Long Beach Sunset) in Flickr, note that the first thumbnail on the Flickr page (top left) is the one for the larger picture (that’s shown on our page with the thumbnail in yellow-outlined inset).

In large mass-production systems like Flickr, automatic thumbnails are unavoidable, and my point is not that they should never be used. Instead, my point is that, on many levels, pictures require more human input than text to make them optimally usable. Pattern recognition — the simple observation that the thumbnail of a picture of a sunset SHOULD CONTAIN THE SUN — is something that the human brain does easily, but this does not come naturally for a computer.


Another sort of problem in automatic production of thumbnails is making a thumbnail by simply reducing the size of the large picture. If the main subject of the picture is relatively small, it is not visible in a small thumbnail.

The picture to the left is from the Hardin Library ContentDM collection. The inset in the upper right shows the thumbnail that’s generated automatically by the system, which does a poor job of showing details of the picture. The lower inset shows a thumbnail made manually, which gives a much more clear view of the central image in the picture.

Cropping of a picture to produce a thumbnail, as done here, takes more subtle human judgement than the case with the Flickr picture in the first example, where the weakness of automatic production is obvious. With cropping, there’s inevitably a trade-off between showing the whole picture in the thumbnail or showing the most important subject of the picture. In cases such as this one from ContentDM, where most all of the detail in the picture will be lost in a small thumbnail, it seems better to focus on a central image that will show up in the thumbnail.

Finally, a few examples from Hardin MD, below, show how we have done cropping to improve the detail in our thumbnails. The thumbnails on the left in each of the three pairs are made by simply reducing the size of the full picture. On the right in each pair are the thumbnails we use, that we have made by cropping the full picture before making the thumbnail.

The biomedical, scientific pictures that we work with in Hardin MD are fairly easy to make thumbnails for, because they generally have a well-defined focus, that’s usually captured well by automatically-generated thumbnails. More artistic, humanities-oriented pictures, such as the ones discussed here from Flickr and ContentDM, however, often have more subtle subjects, that benefit from the human intelligent touch in the production of thumbnails.

Why an article about a children’s book site? When I first came across the International Children’s Digital Library (ICDL), it immediately struck me as being visually elegant, but could I justify putting it on an academic blog site? The more I thought about it, though, the more it seems very much on target — The theme of this blog is the digitization of pictures, including especially pictures in books. Another theme is that in mass digitization projects, the main concern seems to be text, and that pictures are often overlooked. So, yes, ICDL, with its elegant presentation of pictures and text, is right on target. … And then, of course, finding ICDL in Google as a prime example of a “digital library” seals the deal!

ICDL has many excellent features as a children’s book site e.g. its novel ways to find books, by color, theme, etc and its inclusion of books in a wealth of languages. The aspect of ICDL that I’ll highlight briefly here though, that can serve as a model for any site with illustrated books, is its polished delivery of text and pictures, featured especially in the Book Overview screen, shown below.

Book Overview: Calling the doves = El canto de las palomas

The Mouse-over Preview, that shows an enlarged version of a the thumbnail as the user holds the mouse pointer over it, makes this screen especially effective. To see the nice touches at work here, try changing the window size — As the window is made smaller, the thumbnails also become smaller, so that all of them remain visible. And, even better, the mouse-over preview window does NOT shrink, keeping the same size no matter how small the thumbnails become.

Though ICDL lacks some features of a full-fledged enterprise book-viewing system (text is not available as text), its innovative presentation of book pages serves to show how far existing systems have to go in presenting books with pictures — There’s just no substitute for displaying small versions of the book’s pages that show the pictures and how they relate to the text, and ICDL is a model of how to do this.

ICDL has its roots at the University of Maryland; it’s now run by the ICDL Foundation. It’s written in Java. For more technical details, see paper by ICDL authors.

University of Utah has long been a pioneer in the digitization of medical visual resources, under the leadership of the Eccles Health Sciences library. Utah is especially notable for the wide variety of its resources, with strong collections in several basic biomedical and clinical areas.

Most of the Eccles digital image collections are listed on the Digital Collections page, although they’re mixed in with resources from other sites around the US, and sometimes difficult to identify as having been developed at Utah. Several of the Utah collections are described below.

NOVEL is the Neuro-Ophthalmology Virtual Education Library. This collaborative effort between Eccles Library and the North American Neuro-Ophthalmology Society (NANOS), brings together 11 collections of visual resources from personal working in the discipline around the US.

NOVEL is the only one of the Utah segments that uses the ContentDM digital collection management system. ContentDM is widely used by libraries in the US for historical/archival subjects, but for some reason it’s rarely used for biomedical or scientific subjects. The NOVEL project is notable because it’s one of the few sites anywhere that does this.

In addition to pictures, some of the collections in NOVEL also have videos. A good example of this is the collection of Shirley H. Wray, from Harvard Medical School — See link below for Nerve Palsy.

novel_nervepalsy21_46.JPG
webpath_emphysema21_40.JPG
kw_bezant_insectbite21_43.JPG

WebPath, the Internet Pathology Laboratory for Medical Education, includes over 1900 pictures along with text and tutorials. It was developed by Edward C. Klatt MD in the Pathology Dept at Utah; Klatt is now on the faculty at Florida State University. The heart of the WebPath collection for disease-specific pictures is in the Systematic Pathology section, which has images broken down by organ system.

Notable in the Knowledge Weavers section of the Eccles site is the Dermatology Image Bank, done in collaboration with dermatologist John L. Bezzant. This contains striking dermatologic pictures, which are often found by Google Image Search. Knowledge Weavers also includes well-known sites such as Slice of Life and HEAL.

medicalgenetics_20.JPG Another interesting digital resource at Utah, which is not associated with the library, is pictures from the prominent medical textbook, Medical Genetics (lead author Lynn Jorde, published by Mosby). This site also includes some pictures from WebPath. medicalgenetics_twins_65.JPG

The Digital Library Collections at Yale Medical Library are notable for several reasons, especially the apparent emphasis that’s being given to the effort by the library’s administration — The digital collections section of their website is featured prominently on all of the Collections pages on their site, as shown below.

clin3_74.JPG

Yale is unusual for other reasons — They are one of the few medical/health sciences libraries that have included biomedical/scientific pictures in their digitization efforts, in addition to the historical/archival subjects more commonly done by libraries using content management systems. Also, Yale is unusual in using Greenstone software for digital content management, rather than the more commonly used ContentDM.

The main grouping of digital resources at Yale are described on the Digital Library Collections page. This includes 7 collections, which are mainly historical, but also includes the Pathology Teaching Collection (see sample below), which continues to be used as a teaching resource at Yale. The resources in this section, done with Greenstone, have metadata descriptions, and are searchable.

mitral5_57.JPG
fuchs_plantago3_43.JPG
Pathology Teaching Collection
Fuchs Herbal

Other resources are available on the Electronic Texts in the History of Medicine page. This includes 13 historical works, some of which are notable for their illustrations — See example above from Fuchs’ pioneering 16th century herbal, Primi de stirpivm. Also notable are colored illustrations from the herbal of Christian Egenolff. The Electronic Text sources appear to be image scans only (apparently not done with Greenstone), with no metadata, or other associated text, so they unfortunately are not searchable.

About Greenstone — Yale is one of the few US groups using this digital library system, which originated in New Zealand, and is used widely in other countries. Here’s a list of Greenstone sites.

Over the last three years, we have added close to 800 pictures on about 100 diseases/conditions to Hardin MD.

As the volume of pictures has grown, providing access to them becomes more difficult. For some time, we have grouped pictures on specific disease conditions into small galleries, each with about 3-12 pictures (ant bites, athletes foot, atopic dermatitis below). Recently, however, we have expanded the gallery format, broadening it into larger gallery collections, which have links to the smaller galleries.

gallery5.JPG

Use of the gallery format has been very effective in increasing access to our pictures — We are finding that users are much more likely to click thumbnail disease links deeper on the page than when a list of text links is provided.

In addition to the gallery collection pages for AIDS, cancer, and child diseases, which are shown on the gallery gateway page above, there are also gallery collections for foot problems, herpes, insect bites, mouth sores, nail problems, oral diseases, skin rashes, STD’s, and tropical diseases, all of which are linked on the inclusive gallery page.

New York Public Library is a rich source of digital resources, both text and images. This is especially interesting because they have done an excellent job in making connections from the library catalog (CATNYP) to digitized resources.

Because NYPL is an active participant in Google Books, their recent text digitization efforts seem to have gone into this. They’ve done a good job of making links from CATNYP to the books from their collection that have been digitized for Google Books.

A searchable list of all NYPL’s Google Books in CATNYP (32000 titles) is here ….
catnyp.nypl.org/search/XGoogle+Books+Library+Project

To search a subset of this, add a keyword, either in the address bar directly …
catnyp.nypl.org/search/XGoogle+Books+Library+Project+botany
… or add a keyword in the search box.

It’s helpful to have this easy access to NYPL books that are in Google Books through CATNYP, but it’s surprising that the CATNYP record gives no information indicating the print version from which the Google Books version has been digitized. Here’s an example of a book title found in CATNYP, with separate entries for the Google Books and print versions, with neither record linking to the other.

While NYPL’s book digitization efforts seem to be concentrated in Google Books, they continue to do their own image digitization work. As with Google Books, they do a nice job of making links in CATNYP, from the catalog records of books from which they’ve digitized images to the images in the Digital Gallery. The screen-shots below show an example :

catalog6.JPG

This shows links between the CATNYP record for the book American medical botany to the images from the book in the NYPL Digital Gallery.

I see on pages in the Digital Gallery that they’re working on a “new look” for Gallery pages. Here’s the new look for Gallery pages for American medical botany. It’s an improvement in many ways, more streamlined, but doesn’t seem to have a link back to the record for the book in CATNYP.

From the Digital Gallery IT Architecture and Delivery : “Runs on an open, extensible architecture … managed through an Oracle database … ColdFusion software provides the application programming interface that integrates metadata and images for web delivery…”

[When Hardin MD was launched in 1996, its main purpose was to provide links to health science resources on the Web. In recent years, the emphasis has been on providing access to medical pictures.]

We first started tracking how well Google was finding Hardin MD pages in about 2001, when search engine optimization was in its infancy, and most people, like us, had not heard the term “SEO.” But in today’s lingo, that’s pretty much what we were doing — Learning to use language that would help people searching in Google to find our pages — So here’s a little example of using search engine optimization techniques before they became famous as SEO. …

Users of Hardin MD will notice that the word “pictures” is used frequently on our pages and the word “images” is rarely used. Why is this? Basically, the answer is simple — We use “pictures” because that’s the word people use in searching.

The screen-shots below, for the Hardin MD : Impetigo Pictures page, show this clearly. The Extreme Tracker shot for this page shows the large proportion of search engine traffic from the word “pictures” (36%) compared to the small amount of traffic from the word “images” (0.6%).

hmd_impetigopics.JPG
extremeimpetigo.jpg
Hardin MD : Impetigo Pictures page
Keywords (Extreme Tracker)

The Google screen-shots show that the Impetigo Pictures page gets an equally high ranking for the two words, so it’s apparent that “pictures” is being searched much more frequently.

g_pictures.jpg
g_images.jpg
Google search: impetigo pictures
Google search: impetigo images

(Note that these screen-shots have been photo-edited to fit the space — Ads and other text not relevant to the article have been removed. All screen-shots captured in July 2008.)

Here’s the background …

In about 2001, we started noticing how people were finding Hardin MD pages in search engines, and designing our pages to make them more likely to be found. An important part of this was using words that people were more likely to search (e.g. “heart diseases” instead of “cardiology”). Tools such as WordTracker that show how many people are searching for particular words are especially useful for this.

About this same time, we were starting to make links to other sites that have pictures on medical/disease subjects. Using WordTracker, and ExtremeTracker (to see words people were searching to find our pages) it was striking that the word “pictures” was very effective. At the time, we assumed that the appropriate word to use was “images,” since that word is what’s used on most medical/disease pages at other sites. We could see clearly, however, that using the word “pictures” on our pages brought much more traffic than the word “images.” So we’ve gone on from there, and now have high rankings in Google for many medical/disease subjects combined with “pictures,” as with Impetigo.

Extreme Tracker | WordTracker