At his demo of the IA BookReader at the recent Books in Browsers conference, Mike Ang said about the new BookReader thumbnail view — “We think this is one example where the digital book has some advantages over the printed one.” Mike was talking particularly about the ability of  the thumbnail view to give a unique overview of a book’s contents. I came across an example that shows the usefulness of this, described below.

On the top frame of the graphic at left is a shot from the personal copy of a book by Isaac Newton that has his own personal annotations in the margins, that’s described in IA staffer George Oates’s blog article — This sounded interesting when I read it, but the article didn’t have a link or page number where the annotation in the example appeared in the book. So I searched for the book in IA, and I was able to visually scan through it quickly to find the annotation, using the thumbnail view, as shown in the bottom frame at left.

This simple little example fits in nicely with the idea I’ve discussed in several articles on this blog, that thumbnails are invaluable especially in books that contain non-textual material — In the examples I’ve blogged about previously, this has been illustrations, but marginalia also fits nicely into this category.

A few more details on the Newton example — The close-up of the text (top frame) is from a set of Oates’ slides (#24) about the project; it’s also in her article linked above. As mentioned, although these sources have nice detail about the unusual Newton treasure, neither has a specific link to the occurrence or page number of the annotation shown. The IA record for the book has a note saying “Includes Issac Newton’s handwritten notations,” but doesn’t say exactly where they occur. It turns out that the annotation is on page 73.

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

The Internet Archive’s BookReader got a lot attention at the Books in Browsers conference at IA headquarters in San Francisco last week. IA engineer Mike Ang gave a  technical talk to conference attendees on using BookReader with a touch interface (iPad, Android). He also did a demo as part of Brewster Kahle’s “Books in Browsers” Keynote which was open to the general public, and that’s mostly what I’ll discuss in this article.

The IA blog article on Kahle’s Keynote has a video that includes Ang’s BookReader demo, with some screenshots from it. But the transcribed text in the article doesn’t include the demo, so I’ll give a little summary here — Ang’s 11-minute demo (16:26-27:40 on the video) includes enhanced search capabilities, audio generation from text, use on an iPad, and the thumbnail view (shown in the picture at left), which I discussed in an earlier article.

Ang said in the demo, and also in the conference session, that his team has the new version of BookReader working well in all browsers except Internet Explorer, and that that’s the main hold-up in releasing the new version. He’s hoping it will be out in the next few weeks.

In the conference session, Ang said that it’s especially difficult to get BookReader to work on iOS and Android smartphones and tablets because “multitouch events” are programmed differently on each different device. I particularly took note of this because I’ve used the current version of BookReader on an iPad, and although it works quite nicely in general, I do notice that it’s fairly slow in pinch zooming. This is also noticeable in Ang’s demo on the video. I hope this problem can be solved — I think BookReader, if it can be made to work smoothly, has great potential on iPad-like tablets — A combination that no doubt seems natural to the people at Internet Archive since, as Ang observed in his demo, the iPad happens to be “the size of a small book.”

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

Internet Archive (IA) has long had an excellent “thumbnail view” of book pages, in the DjVu format, which I described two years ago as being arguably superior to Google Books for viewing books with a lot of illustrations. In April of this year, IA announced an additional thumbnail view, as part of their BookReader format, which I think is even better than the DjVu format. As with the DjVu format, however, getting to the BookReader thumbnail view is a bit tricky for the user. The steps are shown in the graphic below, starting at left on the IA book home/details page. The first step is to click “Read Online” at the top of the list of formats (some books in IA don’t currently have a BookReader version, in which case the “Read Online” link doesn’t appear). The next step, in the middle shot, is to click the rather inconspicuous grid-shaped icon in the top menu bar to view thumbnails.

It would be to the benefit of the Internet Archive project to make their excellent thumbnail views — DjVu and now BookReader thumbnails — easier to find. As I reported recently, Google IS finding IA versions of books, along with its own Google Books versions. And significantly, Google is often choosing to link to the DjVu format, out of the many different formats available in IA. I suspect this is because Google “has a nose for” anything that smells like it’s related to pictures (which I’ve experienced with Hardin MD picture searching for many years).

So, in closing, I’d suggest that the people at Internet Archive do some creative Search Engine Optimization (SEO), which the IA’s Peter Brantley suggested eloquently for libraries a couple of years ago — A bit of tweaking of IA pages might help Google to “find the (graphic) jewels” that they contain — The thumbnail views and formats that the world is looking for!

Finally, I can’t resist adding a BookReader thumbnail example from an elegant 19th century series of botanical prints — Click the screenshot to feast your eyes on more:

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

** Post-conference: Links to Presentations & Follow-up **

The Books in Browsers conference is in San Francisco, Oct 21-22, at the offices of the Internet Archive. Here’s a list of attenders on Twitter, which is about 2/3 of all attenders. In some cases it was difficult to tell if a corporate Twitter name was the best to use for an individual. I have not put people who are on Twitter but have their tweets protected — If you’re one of those and you want to be on the list, let me know.

Thanks to BLAINE Cook – A Twitter List that gives a live tweet feed of all attenders below.

Twitter name Name

Affiliation
bookgluttonNEWS Alber Travis BookGlutton
indiamos Amos India ITP, NYU
mangbot Ang Michael Internet Archive
glassdog Arthur Lance Internet Archive
kindleworld Basten Andrys Kindle Blog
mattBernius Bernius Matthew RIT
edwardbetts Betts Edward Internet Archive
kirkbiglione Biglione Kirk Oxford Media Works
nathanbransford Bransford Nathan Curtis Brown Agency
naypinya Brantley Peter Internet Archive
patrickrbrown Brown Patrick Goodreads
jambina Buckland Amy McGill University
otown Chandler Otis Goodreads
BLAINE Cook Blaine Romeda
VidLit Dubelman Liz VidLit
abdelazer Fahlgren Keith Ibis Reader
FRANCOMEDIA Franco Kevin Francomedia
snowmaker Friedman Jared Scribd
JRandomF Fruchterman Jim Benetech
Hadrien Gardeur Hadrien Feedbooks
rochellegrayson Grayson Rochelle BookRiff
epistemographer Greenberg Josh Sloan Foundation
m_gylling Gylling Markus IDPF / DAISY
usfsrlib Hewlett Jean USF Gleeson Library
jhorodyski Horodyski John Wrinkled Pants
JenHoward Howard Jennifer Chronicle Higher Ed
shadowsun7 James Eli Novelr
jaquith Jaquith Waldo Univ. of Virginia
UDCMRK Kalfatovic Martin BioDiversity Heritage Library
bookmasters Kasher Bob Bookmasters
billkendrick Kendrick Bill Smashwords
selfpubbootcamp King Carla PBS MediaShift
metasj Klein SJ OLPC
booksquare Krozser Kassia Booksquare
curiouslee Lee Mike OLPC / AARP
jacoblewism Lewis Jacob Figment Fiction
jessielorenz Lorenz Jessie Independent Living SF
ivorymadison Madison Ivory Red Room
armco Malkin Andrew Overbrook Consulting
kevinmarks Marks Kevin Independent
ronmartinez Martinez Ron Invention Arts
hughmcguire McGuire Hugh BookOven
emckean McKean Erin Wordnik
abrahammertens Mertens Abraham Red Room
KatMeyer Meyer Kat O’Reilly Media
bookglutton Miller Aaron BookGlutton
calliemiller Miller Callie LitLife
craigmod Mod Craig Pre/Post Books
tiny_librarian Morin Becky California Acad Sci
m_murrell Murrell Mary UC Berkeley
R_Nash Nash Richard Cursor Books
allennoren Noren Allen O’Reilly Media
brianoleary O’Leary Brian Magellan Partners
josephpearson Pearson Joseph Monocle
pilsks Pilsk Suzanne Smithsonian Inst Libraries
poezn Porath Michael UC Berkeley I-School
draccah Raccah Dominique Sourcebooks
Ebooq Reed Cartwright ebooq
sol613 Rosenberg Sol Copia
ericrumsey Rumsey Eric Univ of Iowa
jason_schultz_ Schultz Jason UC Berkeley
letiziasechi Sechi Letizia Bookrepublic
ifbook Stein Bob If Book
mtamblyn Tamblyn Michael Kobo Books
pthompson Thompson Patrick Inkstone Software
ftoolan Toolan Fran Firebrand Tech
minh_truong Truong Minh Aldiko
aweber9 Weber Andrew Random House
dwilk Wilk David Creative Mgmt Partners
adamwitwer Witwer Adam O’Reilly Media
tiffanycmw Wong Tiffany Aldiko

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

In a companion case study of searching for a book title in Google Book Search (GBS), I reported that there were multiple editions from Google Books but no editions from Internet Archive (IA). In this article, I report on searching the same title — Diagnostic and therapeutic technic, by Albert S. Morrow — directly in Internet Archive. GBS found four editions for the book, and IA finds three, from a variety of sources.

Results

The list below is the titles retrieved in searching for Diagnostic and therapeutic technic in IA, in the rank they appeared:

• 1. Diagnostic and  therapeutic technic, 1911
Digitizing sponsor: Google; From the collections of: Harvard University; Downloads: 43
Source: Google: GBS-Library: Harvard

• 2. Diagnostic and  therapeutic technic, 1911
Digitizing sponsor: Google; From the collections of: unknown library; Downloads: 39
Source: Google: GBS-Library: Stanford

• 3. Diagnostic and  Therapeutic Technic, 1921
Digitizing sponsor: Google; from the collections of: unknown library; Downloads: 21
Source: Google: GBS-Library: Stanford

• 4. Diagnostic and  therapeutic technic, 1915
Digitizing sponsor: Google; from the collections of: Harvard University; Downloads: 27
Source: Google: GBS-Library: Harvard

• 5. Diagnostic and therapeutic technic, 1915
From scan: “Digitized by the Internet Archive in 2010 with funding from Open Knowledge Commons”
Digitizing sponsor: Open Knowledge Commons; Contributor: Columbia University Libraries; Downloads: 2

• 6. Diagnostic and  therapeutic technic, 1915
Digitizing sponsor: MSN; Contributor: University of California Libraries; Collection: americana, CDL; Downloads: 130

• 7. Diagnostic and  therapeutic technic, 1921
Digitizing sponsor: MSN; Contributor: Gerstein – University of Toronto; Downloads: 67

Observations and conclusions

Sources of records:

  • The first four records are from Google Book Search (although it’s not all of the records for the title that are in GBS). The IA record for these includes the URL for the corresponding GBS record without linking directly, so I’ve added links to help see the connection.
  • MSN books (#6 – #7) came to IA from the Microsoft effort to scan books in competition with Google, which ended in 2008.
  • Open Knowledge Commons (#5) is the only record that’s not GBS or MSN — It’s apparently related to a new effort by OKC  to scan medical books.

As in GBS, the reasoning for the placement of the different records — different editions and different contributing libraries — is ambiguous. The only order seems to be that records from the same sources are together.

The number of downloads appears to be a good indication of how long a record has been in IA. The MSN records (#6, #7) have been in the longest and have the most downloads. The GBS records (#1 – #4) were apparently entered later, at about the same time, since they have similar download numbers. The Open Knowledge Commons record (#5) was just entered this year, and only has two downloads.

The most interesting finding in this little case study (combined with the one on Google Book Search) is the duplication of GBS records in IA. This raises the question of how the two scanning efforts relate to each other — Which books from GBS are duplicated in IA? Is IA able to scan any full-view books in GBS? Do they particularly scan books from some contributing libraries?

Related articles:

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

I searched a small sample of ten pre-1923, public-domain books in Google Web Search in the last week, to find full-text versions, with the results below. These are all non-fiction titles, chosen more/less randomly, in subject fields of my interest — Medicine, botany, and history.

Results

I did the searches in Google Web Search as detailed below — I looked at the first ten results, and recorded all occurrences of freely-available full-view versions for each title, with rank number. I’ve identified the GBS records by the library that scanned the book. For Internet Archive (IA), I’ve identified records by sponsor/contributor, and also noted whether the link goes to the book home page or the DjVu-formatted version of the book.

Both Google Books & Internet Archive records found:

• 1. American medical botany, Cummings and Hilliard, 1817
Google Web Search: American medical botany cummings
. . # 3 GBS-Library: Oxford Univ
. . # 7 GBS-Library: Harvard
. . # 9 IA:  Book Home Page – Sponsor & Contrib: NCSU

• 2. Portfolio of dermochromes, Jerome Kingsbury, 1913 (3 volumes)
Google Web Search: portfolio dermochromes kingsbury
. . # 1 GBS-Library: Harvard – Volume 1
. . # 5 IA:  Book Home Page – Volume 1 – Sponsor: IA; Contrib: U California
. . # 6 IA:  DjVu format – Volume 2 – Sponsor: IA; Contrib: U California

• 3. The Complete herbalist, or, The people their own physicians, Oliver Phelps Brown, 1870
Google Web Search: Complete herbalist, or, The people their own physicians
. . # 1 IA:  Book Home Page – Sponsor: Lyrasis, Sloan Fndtn; Contrib: Rutgers
. . # 2 IA:  Book Home Page – Sponsor: MSN; Contrib: U California
. . # 8 GBS-Library: Harvard

• 4. English and American tool builders, Joseph W. Roe, 1916
Google Web Search: english and american tool builders roe
. . # 1 GBS-Library: Harvard
. . # 4 IA:  Book Home Page – Sponsor: Boston Lib Consortium; Contrib: Northeastern U
. . # 5 IA:  DjVu format – Full Text of #4

• 5. Health service in industry, Irving Clark, 1922
Google Web Search: health service in industry clark
. . # 1 GBS-Library: California
. . # 2 IA:  Book Home Page – Sponsor: MSN; Contrib: U Toronto
. . # 3 IA:  Book Home Page – Sponsor: Google; Contrib: ?

• 6. History of medicine in its salient features, Walter Libby, 1922
Google Web Search: history of medicine in its salient features libby
. . # 1 GBS-Library: Harvard
. . # 4 IA:  DjVu record – Sponsor: MSN; Contrib: U California

Only Google Books records found, none from Internet Archive:

• 7. The Theory and practice of veterinary medicine, Austin H. Baker, Alexander Eger, 1911
Google Web Search: theory and practice of veterinary medicine baker
. . # 1 GBS-Library: Wisconsin

• 8. Atlas of diseases of the skin, Franz Mraček, ed. by Henry W. Stelwagon, 1899
Google Web Search: atlas diseases of the skin stelwagon
. . # 1 GBS-Library: Harvard – umQPAAAAYAAJ

• 9. How are you feeling now, Edwin Sabin, 1917
Google Web Search: how are you feeling now sabin
. . # 1 GBS-Library: California

Only in Google Books – Publisher Preview only – Google Book Search has in Full-view:

• 10. Beyond the Mississippi : from the great river to the great ocean, Albert Richardson, 1867
Google Web Search: beyond the mississippi richardson
. . # 3 GBS-Publisher: Preview of 2007 reprint, no full-view available. The title IS available when searched directly in Google Book Search ->>
>> Google Book Search search, limit to Full view: beyond the mississippi richardson
. . # 1 GBS-Library: Virginia

Conclusions

This is certainly not a larger enough sample to draw many conclusions, but I think it does show a few things:

  • There’s a lot of overlap between what’s in the two sources – The first 6 of the 10 books searched are in both Google Books (GBS) and Internet Archive (IA).
  • Not surprisingly, when there are titles in both sources, Google usually ranks GBS higher than IA (one exception: #3).
  • Libraries represented in GBS – Harvard predominates, with 6 of the 10 records — This fits my general Googling experience. Univ California is second with 2 records — This is a higher proportion than I’ve experienced.
  • IA sources – 3 of the 6 records have MSN as sponsor; of these, 2 are contributed by Univ California.
  • Links to Internet Archive are haphazard – In most cases there’s a link to the Book Home Page, as there should be, since it has a list of different formats available. In some cases, there’s also a link to the DjVu format, and in one case (#10), that’s the only link. Why does Google link to this format instead of others? Maybe it’s because DjVu is good for displaying pages with pictures. But the version of the DjVu format that Google links to is not the best one, as I’ve discussed previously.
  • In one case (#10), Google Web Search didn’t find any full-view versions, and Google Book Search did find one.

My purpose here was not to look at the proportion of all books that are in GBS or IA — That would take a larger sample, and more systematic randomizing. But I can report that I did find most of the titles I searched, which surprised me.

As I report in a separate article, it’s likely that there are  GBS or IA versions of other editions of many of these books, that could be found by searching directly in these sources.

There were no full-text versions in the Google Web Searches I did from any other source than GBS or IA. I was surprised at this, especially that Gutenberg.org did not appear in any of the search results.

Caveat: The results for the specific searches in Google Web Search will certainly change over time, so the study should be thought of as capturing a moment in time, not results set in stone!

Related articles:

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

I wrote last week about the DjVu format that’s among the formats supported by Internet Archive, and why it’s so good for displaying books with pictures. In this post, I’ll detail how to take advantage of DjVu’s picture-viewing capabilities.

For the most part, DjVu is well-documented. It’s widely acknowledged that the DjVu format excels in the online presentation of images/pictures, when compared with PDF, but this is not emphasized as much as it should be. In most discussions of eBooks, the emphasis is on text, and pictures are an afterthought.

This under-emphasis of commentators about the capability of DjVu in presenting books with pictures/images is perhaps related to the fact that the DjVu system itself has surprising design lapses that make it hard for the user to intuit the system’s graphic capabilities.

DjVuThese powerful graphic features are especially related to use of thumbnails, which are much of what makes DjVu so useful for viewing books with pictures.

The thumbnail bar, shown to the left, is the key to navigating the pages of a book. The first hurdle in using this is that, oddly, the default display when a book is first displayed does not show the thumbnail bar. To turn it on, the user has to click the Show/hide thumbnail icon, which hides inconspicuously on the far right side of the toolbar. Turning on the thumbnail bar display can also be done by right clicking anywhere and choosing Layout – Thumbnails (In another odd, unaccountable oversight, the Show/hide thumbnail icon does not appear at all on the toolbar in the Macintosh -Safari version of DjVu, and the user has to use the right-click [or CTRL key] option to turn it on.)

DjVu
The default display of the  thumbnail bar is quite small, so the next step in using it to get a better view of page contents is to enlarge the size of thumbnail images, by dragging the mouse, as shown at left.

The thumbnail bar works smoothly — Thumbnails are loaded rapidly as the user scrolls down to see more. Surprisingly the speed of loading seems to be little affected when the size of thumbnails is enlarged. It’s odd that the default size of images in the thumbnail bar is so small, when the larger size works so well — Another indication, I think, that the DjVu developers are not thinking much about use of the system for viewing books with pictures, since it’s so much easier to see details in pictures with larger thumbnails.

Finally, one more hurdle to using DjVu seems to exist in Internet Archive, which is the largest source of DjVu records — When the DjVu format is chosen in the “View the book” box, the link to open the DjVu file is broken. The way around this is to click All Files: HTTP, which is at the bottom of the “View the book” box. This goes to an index screen listing several formats, and clicking the one that ends in .djvu (usually the first in the list) successfully opens the file in DjVu format. I sent a question about this on Sept 8 to the DjVu.org forum, and have not gotten an answer on Sept 10 — Go here to see the question and to see if it has been answered.

Yogi Berra quote of the day: “You better cut the pizza in four pieces because I’m not hungry enough to eat six.”

Finding a heavily illustrated book that’s in both Google Books (GBS) and Internet Archive (IA) gives a good comparison of the strengths and weaknesses in the way illustrated books are presented in these systems.

Shown below are the “intro” pages for the book in the 2 systems. The clear advantage of the GBS intro page is that the sample thumbnails in the lower right make it immediately obvious that the book has COLOR pictures of good quality.

In Internet Archive the main job of intro screen (below) is to direct the user to options to view the book, in the box in the upper left, and there’s no indication that the book contains pictures.

Even after pulling up the DjVu option to view the book — which is a tricky matter, see how to do it here — there’s no intro screen at all in DjVu, just an imposing blank page waiting for the user to change display options or begin paging through the book sequentially.

DjVuIt’s when the user chooses display options and begins viewing the book that the advantages of DjVu become evident. The most important option, especially if pictures are an important part of the book, as they are in the Mracek Atlas book shown here, is to turn on the thumbnail display bar (at left) by clicking the icon in the lower right corner of the DjVu display window. It then becomes easy to scroll through the thumbnails and get a good view of the nature of the pictures in the book, and how they relate to the text. In the Mracek Atlas, it happens that the first third of the book is all text, and the last two-thirds is mostly pictures, so the user can scroll to the pictures easily.

Use of thumbnails is a good way to provide access to pictures in a book. But as simple and obvious as it is, thumbnail access is lacking in most e-book systems, so both GBS and DjVu are to be applauded for providing it, in their different ways. Here’s a comparison of the two systems …

In GBS, the About this book page gives immediate thumbnail access to a maximum of 30 pictures. Additional pictures have no thumbnail access, and can only be found by scrolling through pages or text searching.

DjVu has the disadvantage of having no Intro page that gives an overview of pictures in the book. But when the user knows how to set the display options, it provides good thumbnail access to an unlimited number of pictures. In a book like he Mracek Atlas, with over 100 pictures, this is a definite advantage.

Postscript: It wasn’t easy to find a book that’s in both GBS and IA, so I was especially pleased to find the Mracek Atlas discussed here that has pictures in Hardin MD! The full citation for the book is: Atlas of diseases of the skin, by Franz Mracek, 1899 [GBS | IA]