Internet Archive (IA) has long had an excellent “thumbnail view” of book pages, in the DjVu format, which I described two years ago as being arguably superior to Google Books for viewing books with a lot of illustrations. In April of this year, IA announced an additional thumbnail view, as part of their BookReader format, which I think is even better than the DjVu format. As with the DjVu format, however, getting to the BookReader thumbnail view is a bit tricky for the user. The steps are shown in the graphic below, starting at left on the IA book home/details page. The first step is to click “Read Online” at the top of the list of formats (some books in IA don’t currently have a BookReader version, in which case the “Read Online” link doesn’t appear). The next step, in the middle shot, is to click the rather inconspicuous grid-shaped icon in the top menu bar to view thumbnails.

It would be to the benefit of the Internet Archive project to make their excellent thumbnail views — DjVu and now BookReader thumbnails — easier to find. As I reported recently, Google IS finding IA versions of books, along with its own Google Books versions. And significantly, Google is often choosing to link to the DjVu format, out of the many different formats available in IA. I suspect this is because Google “has a nose for” anything that smells like it’s related to pictures (which I’ve experienced with Hardin MD picture searching for many years).

So, in closing, I’d suggest that the people at Internet Archive do some creative Search Engine Optimization (SEO), which the IA’s Peter Brantley suggested eloquently for libraries a couple of years ago — A bit of tweaking of IA pages might help Google to “find the (graphic) jewels” that they contain — The thumbnail views and formats that the world is looking for!

Finally, I can’t resist adding a BookReader thumbnail example from an elegant 19th century series of botanical prints — Click the screenshot to feast your eyes on more:

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

I searched a small sample of ten pre-1923, public-domain books in Google Web Search in the last week, to find full-text versions, with the results below. These are all non-fiction titles, chosen more/less randomly, in subject fields of my interest — Medicine, botany, and history.

Results

I did the searches in Google Web Search as detailed below — I looked at the first ten results, and recorded all occurrences of freely-available full-view versions for each title, with rank number. I’ve identified the GBS records by the library that scanned the book. For Internet Archive (IA), I’ve identified records by sponsor/contributor, and also noted whether the link goes to the book home page or the DjVu-formatted version of the book.

Both Google Books & Internet Archive records found:

• 1. American medical botany, Cummings and Hilliard, 1817
Google Web Search: American medical botany cummings
. . # 3 GBS-Library: Oxford Univ
. . # 7 GBS-Library: Harvard
. . # 9 IA:  Book Home Page – Sponsor & Contrib: NCSU

• 2. Portfolio of dermochromes, Jerome Kingsbury, 1913 (3 volumes)
Google Web Search: portfolio dermochromes kingsbury
. . # 1 GBS-Library: Harvard – Volume 1
. . # 5 IA:  Book Home Page – Volume 1 – Sponsor: IA; Contrib: U California
. . # 6 IA:  DjVu format – Volume 2 – Sponsor: IA; Contrib: U California

• 3. The Complete herbalist, or, The people their own physicians, Oliver Phelps Brown, 1870
Google Web Search: Complete herbalist, or, The people their own physicians
. . # 1 IA:  Book Home Page – Sponsor: Lyrasis, Sloan Fndtn; Contrib: Rutgers
. . # 2 IA:  Book Home Page – Sponsor: MSN; Contrib: U California
. . # 8 GBS-Library: Harvard

• 4. English and American tool builders, Joseph W. Roe, 1916
Google Web Search: english and american tool builders roe
. . # 1 GBS-Library: Harvard
. . # 4 IA:  Book Home Page – Sponsor: Boston Lib Consortium; Contrib: Northeastern U
. . # 5 IA:  DjVu format – Full Text of #4

• 5. Health service in industry, Irving Clark, 1922
Google Web Search: health service in industry clark
. . # 1 GBS-Library: California
. . # 2 IA:  Book Home Page – Sponsor: MSN; Contrib: U Toronto
. . # 3 IA:  Book Home Page – Sponsor: Google; Contrib: ?

• 6. History of medicine in its salient features, Walter Libby, 1922
Google Web Search: history of medicine in its salient features libby
. . # 1 GBS-Library: Harvard
. . # 4 IA:  DjVu record – Sponsor: MSN; Contrib: U California

Only Google Books records found, none from Internet Archive:

• 7. The Theory and practice of veterinary medicine, Austin H. Baker, Alexander Eger, 1911
Google Web Search: theory and practice of veterinary medicine baker
. . # 1 GBS-Library: Wisconsin

• 8. Atlas of diseases of the skin, Franz Mraček, ed. by Henry W. Stelwagon, 1899
Google Web Search: atlas diseases of the skin stelwagon
. . # 1 GBS-Library: Harvard – umQPAAAAYAAJ

• 9. How are you feeling now, Edwin Sabin, 1917
Google Web Search: how are you feeling now sabin
. . # 1 GBS-Library: California

Only in Google Books – Publisher Preview only – Google Book Search has in Full-view:

• 10. Beyond the Mississippi : from the great river to the great ocean, Albert Richardson, 1867
Google Web Search: beyond the mississippi richardson
. . # 3 GBS-Publisher: Preview of 2007 reprint, no full-view available. The title IS available when searched directly in Google Book Search ->>
>> Google Book Search search, limit to Full view: beyond the mississippi richardson
. . # 1 GBS-Library: Virginia

Conclusions

This is certainly not a larger enough sample to draw many conclusions, but I think it does show a few things:

  • There’s a lot of overlap between what’s in the two sources – The first 6 of the 10 books searched are in both Google Books (GBS) and Internet Archive (IA).
  • Not surprisingly, when there are titles in both sources, Google usually ranks GBS higher than IA (one exception: #3).
  • Libraries represented in GBS – Harvard predominates, with 6 of the 10 records — This fits my general Googling experience. Univ California is second with 2 records — This is a higher proportion than I’ve experienced.
  • IA sources – 3 of the 6 records have MSN as sponsor; of these, 2 are contributed by Univ California.
  • Links to Internet Archive are haphazard – In most cases there’s a link to the Book Home Page, as there should be, since it has a list of different formats available. In some cases, there’s also a link to the DjVu format, and in one case (#10), that’s the only link. Why does Google link to this format instead of others? Maybe it’s because DjVu is good for displaying pages with pictures. But the version of the DjVu format that Google links to is not the best one, as I’ve discussed previously.
  • In one case (#10), Google Web Search didn’t find any full-view versions, and Google Book Search did find one.

My purpose here was not to look at the proportion of all books that are in GBS or IA — That would take a larger sample, and more systematic randomizing. But I can report that I did find most of the titles I searched, which surprised me.

As I report in a separate article, it’s likely that there are  GBS or IA versions of other editions of many of these books, that could be found by searching directly in these sources.

There were no full-text versions in the Google Web Searches I did from any other source than GBS or IA. I was surprised at this, especially that Gutenberg.org did not appear in any of the search results.

Caveat: The results for the specific searches in Google Web Search will certainly change over time, so the study should be thought of as capturing a moment in time, not results set in stone!

Related articles:

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

I wrote last week about the DjVu format that’s among the formats supported by Internet Archive, and why it’s so good for displaying books with pictures. In this post, I’ll detail how to take advantage of DjVu’s picture-viewing capabilities.

For the most part, DjVu is well-documented. It’s widely acknowledged that the DjVu format excels in the online presentation of images/pictures, when compared with PDF, but this is not emphasized as much as it should be. In most discussions of eBooks, the emphasis is on text, and pictures are an afterthought.

This under-emphasis of commentators about the capability of DjVu in presenting books with pictures/images is perhaps related to the fact that the DjVu system itself has surprising design lapses that make it hard for the user to intuit the system’s graphic capabilities.

DjVuThese powerful graphic features are especially related to use of thumbnails, which are much of what makes DjVu so useful for viewing books with pictures.

The thumbnail bar, shown to the left, is the key to navigating the pages of a book. The first hurdle in using this is that, oddly, the default display when a book is first displayed does not show the thumbnail bar. To turn it on, the user has to click the Show/hide thumbnail icon, which hides inconspicuously on the far right side of the toolbar. Turning on the thumbnail bar display can also be done by right clicking anywhere and choosing Layout – Thumbnails (In another odd, unaccountable oversight, the Show/hide thumbnail icon does not appear at all on the toolbar in the Macintosh -Safari version of DjVu, and the user has to use the right-click [or CTRL key] option to turn it on.)

DjVu
The default display of the  thumbnail bar is quite small, so the next step in using it to get a better view of page contents is to enlarge the size of thumbnail images, by dragging the mouse, as shown at left.

The thumbnail bar works smoothly — Thumbnails are loaded rapidly as the user scrolls down to see more. Surprisingly the speed of loading seems to be little affected when the size of thumbnails is enlarged. It’s odd that the default size of images in the thumbnail bar is so small, when the larger size works so well — Another indication, I think, that the DjVu developers are not thinking much about use of the system for viewing books with pictures, since it’s so much easier to see details in pictures with larger thumbnails.

Finally, one more hurdle to using DjVu seems to exist in Internet Archive, which is the largest source of DjVu records — When the DjVu format is chosen in the “View the book” box, the link to open the DjVu file is broken. The way around this is to click All Files: HTTP, which is at the bottom of the “View the book” box. This goes to an index screen listing several formats, and clicking the one that ends in .djvu (usually the first in the list) successfully opens the file in DjVu format. I sent a question about this on Sept 8 to the DjVu.org forum, and have not gotten an answer on Sept 10 — Go here to see the question and to see if it has been answered.

Yogi Berra quote of the day: “You better cut the pizza in four pieces because I’m not hungry enough to eat six.”

Finding a heavily illustrated book that’s in both Google Books (GBS) and Internet Archive (IA) gives a good comparison of the strengths and weaknesses in the way illustrated books are presented in these systems.

Shown below are the “intro” pages for the book in the 2 systems. The clear advantage of the GBS intro page is that the sample thumbnails in the lower right make it immediately obvious that the book has COLOR pictures of good quality.

In Internet Archive the main job of intro screen (below) is to direct the user to options to view the book, in the box in the upper left, and there’s no indication that the book contains pictures.

Even after pulling up the DjVu option to view the book — which is a tricky matter, see how to do it here — there’s no intro screen at all in DjVu, just an imposing blank page waiting for the user to change display options or begin paging through the book sequentially.

DjVuIt’s when the user chooses display options and begins viewing the book that the advantages of DjVu become evident. The most important option, especially if pictures are an important part of the book, as they are in the Mracek Atlas book shown here, is to turn on the thumbnail display bar (at left) by clicking the icon in the lower right corner of the DjVu display window. It then becomes easy to scroll through the thumbnails and get a good view of the nature of the pictures in the book, and how they relate to the text. In the Mracek Atlas, it happens that the first third of the book is all text, and the last two-thirds is mostly pictures, so the user can scroll to the pictures easily.

Use of thumbnails is a good way to provide access to pictures in a book. But as simple and obvious as it is, thumbnail access is lacking in most e-book systems, so both GBS and DjVu are to be applauded for providing it, in their different ways. Here’s a comparison of the two systems …

In GBS, the About this book page gives immediate thumbnail access to a maximum of 30 pictures. Additional pictures have no thumbnail access, and can only be found by scrolling through pages or text searching.

DjVu has the disadvantage of having no Intro page that gives an overview of pictures in the book. But when the user knows how to set the display options, it provides good thumbnail access to an unlimited number of pictures. In a book like he Mracek Atlas, with over 100 pictures, this is a definite advantage.

Postscript: It wasn’t easy to find a book that’s in both GBS and IA, so I was especially pleased to find the Mracek Atlas discussed here that has pictures in Hardin MD! The full citation for the book is: Atlas of diseases of the skin, by Franz Mracek, 1899 [GBS | IA]