At his demo of the IA BookReader at the recent Books in Browsers conference, Mike Ang said about the new BookReader thumbnail view — “We think this is one example where the digital book has some advantages over the printed one.” Mike was talking particularly about the ability of  the thumbnail view to give a unique overview of a book’s contents. I came across an example that shows the usefulness of this, described below.

On the top frame of the graphic at left is a shot from the personal copy of a book by Isaac Newton that has his own personal annotations in the margins, that’s described in IA staffer George Oates’s blog article — This sounded interesting when I read it, but the article didn’t have a link or page number where the annotation in the example appeared in the book. So I searched for the book in IA, and I was able to visually scan through it quickly to find the annotation, using the thumbnail view, as shown in the bottom frame at left.

This simple little example fits in nicely with the idea I’ve discussed in several articles on this blog, that thumbnails are invaluable especially in books that contain non-textual material — In the examples I’ve blogged about previously, this has been illustrations, but marginalia also fits nicely into this category.

A few more details on the Newton example — The close-up of the text (top frame) is from a set of Oates’ slides (#24) about the project; it’s also in her article linked above. As mentioned, although these sources have nice detail about the unusual Newton treasure, neither has a specific link to the occurrence or page number of the annotation shown. The IA record for the book has a note saying “Includes Issac Newton’s handwritten notations,” but doesn’t say exactly where they occur. It turns out that the annotation is on page 73.

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

The Internet Archive’s BookReader got a lot attention at the Books in Browsers conference at IA headquarters in San Francisco last week. IA engineer Mike Ang gave a  technical talk to conference attendees on using BookReader with a touch interface (iPad, Android). He also did a demo as part of Brewster Kahle’s “Books in Browsers” Keynote which was open to the general public, and that’s mostly what I’ll discuss in this article.

The IA blog article on Kahle’s Keynote has a video that includes Ang’s BookReader demo, with some screenshots from it. But the transcribed text in the article doesn’t include the demo, so I’ll give a little summary here — Ang’s 11-minute demo (16:26-27:40 on the video) includes enhanced search capabilities, audio generation from text, use on an iPad, and the thumbnail view (shown in the picture at left), which I discussed in an earlier article.

Ang said in the demo, and also in the conference session, that his team has the new version of BookReader working well in all browsers except Internet Explorer, and that that’s the main hold-up in releasing the new version. He’s hoping it will be out in the next few weeks.

In the conference session, Ang said that it’s especially difficult to get BookReader to work on iOS and Android smartphones and tablets because “multitouch events” are programmed differently on each different device. I particularly took note of this because I’ve used the current version of BookReader on an iPad, and although it works quite nicely in general, I do notice that it’s fairly slow in pinch zooming. This is also noticeable in Ang’s demo on the video. I hope this problem can be solved — I think BookReader, if it can be made to work smoothly, has great potential on iPad-like tablets — A combination that no doubt seems natural to the people at Internet Archive since, as Ang observed in his demo, the iPad happens to be “the size of a small book.”

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

The Books in Browsers conference that met at the Internet Archive in San Francisco October 21 & 22 was a great experience for me and I’m sure for most of the approximately 100 attendees.

Follow-up: Conference reports & Commentaries – Most recent at the top

Presentations – List is derived from the pre-conference Agenda. Twitter addresses are included for presenters who have them — For other attenders on Twitter see here.

Thursday, October 21

  • Allen Noren (@allennoren), O’Reilly Media – Books in Browsers
  • Bill McCoy, Webpaper – Browsers for Books: Formats and User Experiences for Digital Reading   [Slides]
  • Dominique Raccah (@draccah), Sourcebooks – Immersion: What we actually know about adding media to books
  • Waldo Jaquith (@jaquith), VQR – EPUB for website producers
  • Keith Fahlgren (@abdelazer), Ibis Reader – Piercing the Clouds: Privacy, Confidentiality, and Web-based Reading   [Slides]
  • Nicole Ozer, ACLU – Digital Books: A New Chapter for Reader Privacy
  • Jason Schultz (@jason_schultz_), UC Berkeley – Using open licenses to ensure reader privacy
  • SJ Klein (@metasj), OLPC – Rural uses of browser books   [Description: PPT Slides]
  • Jim Fruchterman (@JRandomF), Benetech – Accessibility for browser based books
  • Joseph Pearson (@josephpearson), Inventive Labs – How we’re using Monocle in the Labs
  • Minh Truong (@minh_truong), Aldiko – Books and Apps
  • Daihei Shiohama, Voyager Japan – From mobile comics to broad platform experiences
  • Michael Ang (@mangbot), Internet Archive – Designing books for touch   [Slides in New version of BookReader | Old version]
  • Keynote: Brian O’Leary (@brianoleary), Magellan Partners – A Unified Field Theory of Publishing   [Full-text]
  • Evening Keynote: Books in Browsers – Brewster Kahle, Internet Archive   [Video & Text]

Friday, October 22

  • Keynote: Bob Stein (@ifbook), If:Book – For publishers, working together to support an open-source platform for Social Reading is the key to taking the initiative back from Amazon, Apple and Google   [Background]
  • Keynote: Richard Nash (@R_Nash), Cursor Books – Remember, the reader writes, too… On how discoverability begins with the writer.
  • Kovid Goyal, Calibre – An Alexandria in every neighborhood
  • Aaron Miller (@bookglutton), Bookglutton – A network of Books [Slides]
  • Otis Chandler (@otown), GoodReads – Finding Shelf Space in a World Without Shelves   [Slides]
  • Hadrien Gardeur (@Hadrien), Feedbooks – A Connected Bookshelf   [Slides]
  • Michael Tamblyn (@mtamblyn), Kobo Books – Life Among the Freegans: The Co-Existence of Free Books, Paid Books, and the People Who Read Them
  • Erin McKean (@emckean), Wordnik – Things are looking up for looking things up?
  • Eli James (@shadowsun7), Novelr – Pandamian: A Publishing Support Layer   [Full-text]
  • Kevin Franco (@FRANCOMEDIA), Francomedia – Thriller-based Transmedia and the reader experience
  • Fran Toolan (@ftoolan), Firebrand – A Conversation: Rights in the Book Web
  • Keynote: Matthew Bernius (@mattBernius), RIT – Returning to the Canon for Inspiration: Vannevar Bush, Walter Benjamin, and the future of reading   [Links]
  • Pecha Kucha 7 …
    Blaine Cook (@blaine), Romeda
    Craig Mod (@craigmod), “Post Artifact Story Telling”
    Jacob Lewis (@jacoblewism), Figment
    Cart Reed (@Ebooq), Ebooq

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

Internet Archive (IA) has long had an excellent “thumbnail view” of book pages, in the DjVu format, which I described two years ago as being arguably superior to Google Books for viewing books with a lot of illustrations. In April of this year, IA announced an additional thumbnail view, as part of their BookReader format, which I think is even better than the DjVu format. As with the DjVu format, however, getting to the BookReader thumbnail view is a bit tricky for the user. The steps are shown in the graphic below, starting at left on the IA book home/details page. The first step is to click “Read Online” at the top of the list of formats (some books in IA don’t currently have a BookReader version, in which case the “Read Online” link doesn’t appear). The next step, in the middle shot, is to click the rather inconspicuous grid-shaped icon in the top menu bar to view thumbnails.

It would be to the benefit of the Internet Archive project to make their excellent thumbnail views — DjVu and now BookReader thumbnails — easier to find. As I reported recently, Google IS finding IA versions of books, along with its own Google Books versions. And significantly, Google is often choosing to link to the DjVu format, out of the many different formats available in IA. I suspect this is because Google “has a nose for” anything that smells like it’s related to pictures (which I’ve experienced with Hardin MD picture searching for many years).

So, in closing, I’d suggest that the people at Internet Archive do some creative Search Engine Optimization (SEO), which the IA’s Peter Brantley suggested eloquently for libraries a couple of years ago — A bit of tweaking of IA pages might help Google to “find the (graphic) jewels” that they contain — The thumbnail views and formats that the world is looking for!

Finally, I can’t resist adding a BookReader thumbnail example from an elegant 19th century series of botanical prints — Click the screenshot to feast your eyes on more:

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

** Post-conference: Links to Presentations & Follow-up **

The Books in Browsers conference is in San Francisco, Oct 21-22, at the offices of the Internet Archive. Here’s a list of attenders on Twitter, which is about 2/3 of all attenders. In some cases it was difficult to tell if a corporate Twitter name was the best to use for an individual. I have not put people who are on Twitter but have their tweets protected — If you’re one of those and you want to be on the list, let me know.

Thanks to BLAINE Cook – A Twitter List that gives a live tweet feed of all attenders below.

Twitter name Name

Affiliation
bookgluttonNEWS Alber Travis BookGlutton
indiamos Amos India ITP, NYU
mangbot Ang Michael Internet Archive
glassdog Arthur Lance Internet Archive
kindleworld Basten Andrys Kindle Blog
mattBernius Bernius Matthew RIT
edwardbetts Betts Edward Internet Archive
kirkbiglione Biglione Kirk Oxford Media Works
nathanbransford Bransford Nathan Curtis Brown Agency
naypinya Brantley Peter Internet Archive
patrickrbrown Brown Patrick Goodreads
jambina Buckland Amy McGill University
otown Chandler Otis Goodreads
BLAINE Cook Blaine Romeda
VidLit Dubelman Liz VidLit
abdelazer Fahlgren Keith Ibis Reader
FRANCOMEDIA Franco Kevin Francomedia
snowmaker Friedman Jared Scribd
JRandomF Fruchterman Jim Benetech
Hadrien Gardeur Hadrien Feedbooks
rochellegrayson Grayson Rochelle BookRiff
epistemographer Greenberg Josh Sloan Foundation
m_gylling Gylling Markus IDPF / DAISY
usfsrlib Hewlett Jean USF Gleeson Library
jhorodyski Horodyski John Wrinkled Pants
JenHoward Howard Jennifer Chronicle Higher Ed
shadowsun7 James Eli Novelr
jaquith Jaquith Waldo Univ. of Virginia
UDCMRK Kalfatovic Martin BioDiversity Heritage Library
bookmasters Kasher Bob Bookmasters
billkendrick Kendrick Bill Smashwords
selfpubbootcamp King Carla PBS MediaShift
metasj Klein SJ OLPC
booksquare Krozser Kassia Booksquare
curiouslee Lee Mike OLPC / AARP
jacoblewism Lewis Jacob Figment Fiction
jessielorenz Lorenz Jessie Independent Living SF
ivorymadison Madison Ivory Red Room
armco Malkin Andrew Overbrook Consulting
kevinmarks Marks Kevin Independent
ronmartinez Martinez Ron Invention Arts
hughmcguire McGuire Hugh BookOven
emckean McKean Erin Wordnik
abrahammertens Mertens Abraham Red Room
KatMeyer Meyer Kat O’Reilly Media
bookglutton Miller Aaron BookGlutton
calliemiller Miller Callie LitLife
craigmod Mod Craig Pre/Post Books
tiny_librarian Morin Becky California Acad Sci
m_murrell Murrell Mary UC Berkeley
R_Nash Nash Richard Cursor Books
allennoren Noren Allen O’Reilly Media
brianoleary O’Leary Brian Magellan Partners
josephpearson Pearson Joseph Monocle
pilsks Pilsk Suzanne Smithsonian Inst Libraries
poezn Porath Michael UC Berkeley I-School
draccah Raccah Dominique Sourcebooks
Ebooq Reed Cartwright ebooq
sol613 Rosenberg Sol Copia
ericrumsey Rumsey Eric Univ of Iowa
jason_schultz_ Schultz Jason UC Berkeley
letiziasechi Sechi Letizia Bookrepublic
ifbook Stein Bob If Book
mtamblyn Tamblyn Michael Kobo Books
pthompson Thompson Patrick Inkstone Software
ftoolan Toolan Fran Firebrand Tech
minh_truong Truong Minh Aldiko
aweber9 Weber Andrew Random House
dwilk Wilk David Creative Mgmt Partners
adamwitwer Witwer Adam O’Reilly Media
tiffanycmw Wong Tiffany Aldiko

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

In a companion case study of searching for a book title in Google Book Search (GBS), I reported that there were multiple editions from Google Books but no editions from Internet Archive (IA). In this article, I report on searching the same title — Diagnostic and therapeutic technic, by Albert S. Morrow — directly in Internet Archive. GBS found four editions for the book, and IA finds three, from a variety of sources.

Results

The list below is the titles retrieved in searching for Diagnostic and therapeutic technic in IA, in the rank they appeared:

• 1. Diagnostic and  therapeutic technic, 1911
Digitizing sponsor: Google; From the collections of: Harvard University; Downloads: 43
Source: Google: GBS-Library: Harvard

• 2. Diagnostic and  therapeutic technic, 1911
Digitizing sponsor: Google; From the collections of: unknown library; Downloads: 39
Source: Google: GBS-Library: Stanford

• 3. Diagnostic and  Therapeutic Technic, 1921
Digitizing sponsor: Google; from the collections of: unknown library; Downloads: 21
Source: Google: GBS-Library: Stanford

• 4. Diagnostic and  therapeutic technic, 1915
Digitizing sponsor: Google; from the collections of: Harvard University; Downloads: 27
Source: Google: GBS-Library: Harvard

• 5. Diagnostic and therapeutic technic, 1915
From scan: “Digitized by the Internet Archive in 2010 with funding from Open Knowledge Commons”
Digitizing sponsor: Open Knowledge Commons; Contributor: Columbia University Libraries; Downloads: 2

• 6. Diagnostic and  therapeutic technic, 1915
Digitizing sponsor: MSN; Contributor: University of California Libraries; Collection: americana, CDL; Downloads: 130

• 7. Diagnostic and  therapeutic technic, 1921
Digitizing sponsor: MSN; Contributor: Gerstein – University of Toronto; Downloads: 67

Observations and conclusions

Sources of records:

  • The first four records are from Google Book Search (although it’s not all of the records for the title that are in GBS). The IA record for these includes the URL for the corresponding GBS record without linking directly, so I’ve added links to help see the connection.
  • MSN books (#6 – #7) came to IA from the Microsoft effort to scan books in competition with Google, which ended in 2008.
  • Open Knowledge Commons (#5) is the only record that’s not GBS or MSN — It’s apparently related to a new effort by OKC  to scan medical books.

As in GBS, the reasoning for the placement of the different records — different editions and different contributing libraries — is ambiguous. The only order seems to be that records from the same sources are together.

The number of downloads appears to be a good indication of how long a record has been in IA. The MSN records (#6, #7) have been in the longest and have the most downloads. The GBS records (#1 – #4) were apparently entered later, at about the same time, since they have similar download numbers. The Open Knowledge Commons record (#5) was just entered this year, and only has two downloads.

The most interesting finding in this little case study (combined with the one on Google Book Search) is the duplication of GBS records in IA. This raises the question of how the two scanning efforts relate to each other — Which books from GBS are duplicated in IA? Is IA able to scan any full-view books in GBS? Do they particularly scan books from some contributing libraries?

Related articles:

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

I searched a small sample of ten pre-1923, public-domain books in Google Web Search in the last week, to find full-text versions, with the results below. These are all non-fiction titles, chosen more/less randomly, in subject fields of my interest — Medicine, botany, and history.

Results

I did the searches in Google Web Search as detailed below — I looked at the first ten results, and recorded all occurrences of freely-available full-view versions for each title, with rank number. I’ve identified the GBS records by the library that scanned the book. For Internet Archive (IA), I’ve identified records by sponsor/contributor, and also noted whether the link goes to the book home page or the DjVu-formatted version of the book.

Both Google Books & Internet Archive records found:

• 1. American medical botany, Cummings and Hilliard, 1817
Google Web Search: American medical botany cummings
. . # 3 GBS-Library: Oxford Univ
. . # 7 GBS-Library: Harvard
. . # 9 IA:  Book Home Page – Sponsor & Contrib: NCSU

• 2. Portfolio of dermochromes, Jerome Kingsbury, 1913 (3 volumes)
Google Web Search: portfolio dermochromes kingsbury
. . # 1 GBS-Library: Harvard – Volume 1
. . # 5 IA:  Book Home Page – Volume 1 – Sponsor: IA; Contrib: U California
. . # 6 IA:  DjVu format – Volume 2 – Sponsor: IA; Contrib: U California

• 3. The Complete herbalist, or, The people their own physicians, Oliver Phelps Brown, 1870
Google Web Search: Complete herbalist, or, The people their own physicians
. . # 1 IA:  Book Home Page – Sponsor: Lyrasis, Sloan Fndtn; Contrib: Rutgers
. . # 2 IA:  Book Home Page – Sponsor: MSN; Contrib: U California
. . # 8 GBS-Library: Harvard

• 4. English and American tool builders, Joseph W. Roe, 1916
Google Web Search: english and american tool builders roe
. . # 1 GBS-Library: Harvard
. . # 4 IA:  Book Home Page – Sponsor: Boston Lib Consortium; Contrib: Northeastern U
. . # 5 IA:  DjVu format – Full Text of #4

• 5. Health service in industry, Irving Clark, 1922
Google Web Search: health service in industry clark
. . # 1 GBS-Library: California
. . # 2 IA:  Book Home Page – Sponsor: MSN; Contrib: U Toronto
. . # 3 IA:  Book Home Page – Sponsor: Google; Contrib: ?

• 6. History of medicine in its salient features, Walter Libby, 1922
Google Web Search: history of medicine in its salient features libby
. . # 1 GBS-Library: Harvard
. . # 4 IA:  DjVu record – Sponsor: MSN; Contrib: U California

Only Google Books records found, none from Internet Archive:

• 7. The Theory and practice of veterinary medicine, Austin H. Baker, Alexander Eger, 1911
Google Web Search: theory and practice of veterinary medicine baker
. . # 1 GBS-Library: Wisconsin

• 8. Atlas of diseases of the skin, Franz Mraček, ed. by Henry W. Stelwagon, 1899
Google Web Search: atlas diseases of the skin stelwagon
. . # 1 GBS-Library: Harvard – umQPAAAAYAAJ

• 9. How are you feeling now, Edwin Sabin, 1917
Google Web Search: how are you feeling now sabin
. . # 1 GBS-Library: California

Only in Google Books – Publisher Preview only – Google Book Search has in Full-view:

• 10. Beyond the Mississippi : from the great river to the great ocean, Albert Richardson, 1867
Google Web Search: beyond the mississippi richardson
. . # 3 GBS-Publisher: Preview of 2007 reprint, no full-view available. The title IS available when searched directly in Google Book Search ->>
>> Google Book Search search, limit to Full view: beyond the mississippi richardson
. . # 1 GBS-Library: Virginia

Conclusions

This is certainly not a larger enough sample to draw many conclusions, but I think it does show a few things:

  • There’s a lot of overlap between what’s in the two sources – The first 6 of the 10 books searched are in both Google Books (GBS) and Internet Archive (IA).
  • Not surprisingly, when there are titles in both sources, Google usually ranks GBS higher than IA (one exception: #3).
  • Libraries represented in GBS – Harvard predominates, with 6 of the 10 records — This fits my general Googling experience. Univ California is second with 2 records — This is a higher proportion than I’ve experienced.
  • IA sources – 3 of the 6 records have MSN as sponsor; of these, 2 are contributed by Univ California.
  • Links to Internet Archive are haphazard – In most cases there’s a link to the Book Home Page, as there should be, since it has a list of different formats available. In some cases, there’s also a link to the DjVu format, and in one case (#10), that’s the only link. Why does Google link to this format instead of others? Maybe it’s because DjVu is good for displaying pages with pictures. But the version of the DjVu format that Google links to is not the best one, as I’ve discussed previously.
  • In one case (#10), Google Web Search didn’t find any full-view versions, and Google Book Search did find one.

My purpose here was not to look at the proportion of all books that are in GBS or IA — That would take a larger sample, and more systematic randomizing. But I can report that I did find most of the titles I searched, which surprised me.

As I report in a separate article, it’s likely that there are  GBS or IA versions of other editions of many of these books, that could be found by searching directly in these sources.

There were no full-text versions in the Google Web Searches I did from any other source than GBS or IA. I was surprised at this, especially that Gutenberg.org did not appear in any of the search results.

Caveat: The results for the specific searches in Google Web Search will certainly change over time, so the study should be thought of as capturing a moment in time, not results set in stone!

Related articles:

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey