I wrote recently on the kinship of Google and libraries. I got the idea for that especially from a long portrait of Google co-founder and new CEO Larry Page, which brings out several qualities of Google and Page that I think show commonality with libraries and librarians. In that portrait, Farhad Manjoo contrasts the Google/Page style with the Apple/Steve-Jobs style, and says it’s unlikely that Google will “tap its inner Apple” under Page’s leadership. …

That term “Tap its inner Apple” kept bouncing around in my mind — Larry Page may not help Google find its Inner Apple, I think, but how about adding another twist? — Combining the idea of Google-Librarian temperamental connections, from my previous article, with Google Books, which resonates strongly with librarianship, and was actually conceived by Page — How about Larry Page as Google’s Inner Librarian? …

At first this idea of Larry Page as Google’s “inner librarian” seemed almost too playful to suggest. It was only when I was able to substantiate Page’s central role in creating Google Books and his conception of it in library terms that the idea seemed more credible. The general idea of his involvement in the early years of the project is commonly mentioned, but Google co-founder Sergey Brin is the one who’s gotten more attention talking about it. So it took some digging to find details of Page’s role in the creation of Google Books, which did turn up some bits of solid evidence, discussed below.

The first is the story of Page telling Google CEO Eric Schmidt about his idea for Google Books. This is from Ken Auletta’s book on Google, ironically enough, right from Google Books — Surprisingly, as interesting as the story is, especially from a library point-of-view, googling the quote turns up only a handful of fairly obscure places where it’s cited. The telling here is notable for Page’s strong emphasis of the project’s library-librarian connections:

[boldface added] Schmidt remembers the day in 2002 he walked into Page’s office and Page surprised him by showing off a book scanner he had built. It had been inspired by the great library of Alexandria … “‘We’re going to scan all the books in the world,” Page said. For search to be truly comprehensive, he explained, it must include every book ever published. He wanted Google to “understand everything in the world and give it back to you.” Sort of “a super librarian,’” he said.

The second telling of the story is also little-cited, probably because it’s buried in the middle of a recent multi-paged Wired article. Written by master tech storyteller Steven Levy, it’s notable for the clear statement that the project was Page’s idea:

[boldface added] It was Page who dreamed of digitizing the world’s books. Many assumed the task was impossible, but Page refused to accept that. It might be expensive, but of course it was possible. To figure out just how much time it would take, Page and Marissa Mayer jury-rigged a book scanner in his office, coordinating Mayer’s page-turning to a metronome. Then he filled up spreadsheets with calculations … Eventually, he became convinced that the costs and timing were reasonable. What astounded him was that even his spreadsheets didn’t dissolve the skepticism of those with whom he shared his scheme. “I’d run through the numbers with people and they wouldn’t believe them,’” he later said. “So eventually I just did it.” Page was disappointed when critics … launched a series of legal challenges … “Do you really want the whole world not to have access to human knowledge as contained in books?” Page asks. “You’ve just got to think about that from a societal point of view.”

It’s ironic that Page is taking over as Google CEO just after the rejection of the Google Books Settlement. But I suspect the Google Books project will be seen by librarians of the future as a necessary first step in the evolution of a universal digital library — An idea that might still seem impossible if it hadn’t been for Google. In fact, this process of looking back on Google Books as “history” has already started — Harvard Library director (and historian) Robert Darnton, writing in a NY Times op-ed soon after the Settlement rejection, proposes the creation of A Digital Library better than Google’s. He concludes his piece by giving credit to Google for getting the idea started:

Through technological wizardry and sheer audacity, Google has shown how we can transform the intellectual riches of our libraries, books lying inert and underused on shelves. But only a digital public library will provide readers with what they require to face the challenges of the 21st century.

And it might not have happened if Larry Page hadn’t had the audacious dream of digitizing the world’s books and scanned the first one in his office with Marissa Mayer.

Siva Vaidhyanathan is a frequent commentator on the Google Books Settlement, and my impression has been that he’s generally on the “anti-Google” side. In a long interview by Andrew Albanese in Publishers Weekly, however, Vaidhyanathan presents a more nuanced view. He continues to be unfavorable to the Settlement, and to the part played by libraries in scanning their books for Google. But he also acknowledges the failure of public institutions, especially libraries, in taking the initiative to digitize the world’s books. The interview is full of interesting insights on a wide range of Google-related subjects. Here are some excerpts on Google Books (boldface added):

The Google Books plan is a perfect example of public failure. The great national, public, and university libraries of the world never garnered the funds or the political will and vision needed to create a universal, digital delivery service like Google envisions. Public institutions failed to see and thus satisfy a desire—perhaps a need—for such a service. Google stepped in and declared that it could offer something close to universal access for no cost to the public. The catch, of course, was that it would have to be done on Google’s terms.

Here Vaidhyanathan’s mixed sentiments about Google and the Settlement start to show — He says that if Google had proceeded in its legal battle as he would have preferred, the legality of Search might have been undermined — Which apparently would be OK with him — Even though he says earlier in the interview that he “loves Google” and relies heavily on its Web Search, so acknowledging that he’s like the rest of us, caught on the horns of the Google good-evil dilemma:

[On Google's Fair Use defense in the Books Settlement] Say, Google had decided to fight in court, rather than settle. And say it won before the Supreme Court. Congress was never going to let them just win. Congress would have listened to the major content providers, and it would have intervened in a way that would have restricted fair use. That in turn could have undermined some fundamental practices of the Web, like search. …  But with books, Google reached from the digital world into the analogue world and said to publishers, “You now need to operate by the rules of the Web.”  … As a policy argument, there is something to be said for running copyright the way Google wants to run it. If we were testifying before Congress about such a change, I would be right up there with Google. But as it stands, that’s not what Congress has said, and that’s not what the courts have said.

And here, he seems to be acknowledging that if Google had followed the conventional legal procedures that other companies have to do, there’s a pretty good chance that the scanning project wouldn’t have gotten off the ground yet:

[On the Settlement as a corporate end run around the legislative process] Google figures that if it creates good products and they get popular, the courts and Congress will be less likely to undo them. But that is an arrogant, audacious perspective on the legal and legislative system, and it’s fundamentally antidemocratic. Google should have to do things the old-fashioned way: hire lobbyists to bribe legislators to get their agenda passed [laughs]. Seriously, though, that’s what every other company has to do. And as sick as it sounds, that’s the way the game is played. If Congress thinks it is a bad idea to permit a digital library like this, then we fight harder to convince them why it is a good idea, and we make those arguments in public.

In concluding the interview, Vaidhyanathan returns to the high road, calling for the people of the world to finally take up their responsibility and create the universal digital library:

[On the argument that libraries would never have been able to do the project that Google is doing] If we, the people of the world, the librarians of the world, the scholars of the world, the publishers of the world, decide that we should have a universal digital library, then let’s write a plan, change the laws, raise money, and do it right. If we’re going to create this with public resources, let’s do it in the public interest, not corporate interest. There’s nothing wrong with Google pursuing a books project, of course, and, yes, there are benefits. But we have to understand that what Google has created is first and foremost for Google.

See the complete interview for additional interesting insights: Sergey Brin and Google as the mind of God; the “brilliant story” of Google Search; and why publishers will like Google eBooks more than Amazon.

Google Ngrams is a fascinating visualization tool for studying word frequency over time in the 15 million books that are part of the Google Books project. The research that led to the creation of Ngrams was a cooperative effort between Google and Harvard University.

The little screenshot snippet below shows Ngrams in action, making it easy to see at a glance how cancer has come to predominate over infectious diseases in the 20th century. Other examples show similar trends in related diseases, medical specialty fields, and the practice of healthcare. Ngram viewer IS case sensitive and results vary quite a bit depending on capitalization, so play around with it …

Especially of historic interest:

The screenshots below are from Google Books, showing the link to the Google eBook version in the blue box to the left, and the formats available for downloading, in the upper right. The “Settings” box in the center is pasted from the Google eBook record, to show the connection between download formats and the versions available in Google eBooks.

In the first example, both PDF and ePub formats available for download in Google Books.  Correspondingly, in Google eBooks, Flowing text and Scanned Pages are available.

In the second example, only PDF format available for download in Google Books, and in Google eBooks, only Scanned Pages are available. Note that this is indicated in the blue box in Google Books with the note that the Google eBooks version is “Better for larger screens” (circled in red) – i.e. the PDF version is not good for mobile devices.

How does Google Books relate to Google eBooks? Here’s one interesting little indication — For a full-view, free, public domain book, the URL is identical except that the Google Books version says “books” …

And the new Google eBookstore version says “ebooks” …

This makes it easy to compare the record in Google Books and Google eBookstore — Just add an “e” in the URL!

Notice here also that Google is making the connection between Google Books and the Google eBookstore, by putting the blue box with the “Get it now” button in both versions.

With Google Books being in the news in the last few weeks, I’ve been paying more attention than I have since the Settlement discussion wound down several months ago — In Twitter I’m noticing a large increase in the number of tweets about GBS, especially the number of links to specific book titles in it. Admittedly this is a purely subjective impression, but several months ago I don’t remember this happening frequently. So on Nov 9 I did a small survey of links (below) to GBS that I’ve found, searching in Twitter for google books. This includes a variety of links from the last day, concentrating on tweets whose authors seem to be modest individual tweeters, not “dotcom presences.” I’ve numbered the tweets so that I can refer to them in discussion below.

  1. uflsms: The web reputation systems book (readings) is also at Google Books: http://bit.ly/9HLyPn Not all pages are there, but most are.
    about 2 hours ago via TweetDeck · Reply · View Tweet
  2. berta1974: RT @gasolinero: «La Dificultad De ser Japonés» en Google books con una vista previa de 1/5 del libro http://bit.ly/94C3pW
    about 4 hours ago via TweetDeck · Reply · View Tweet
  3. maramirou: Mabrouk! RT:@zizoo My book is finally on google books :-) -L’hopital Razi de la Manouba et son histoire – Google Books: http://goo.gl/kQFKH
    about 5 hours ago via TnLabs · Reply · View Tweet
  4. dayski: The Alchemy of Air: A Jewish Genius … – Google Books http://t.co/a9YdgtK via @ shr.lc – I like the get this book option on sidebar!
    about 8 hours ago via Tweet Button · Reply · View Tweet
  5. darthconnell: The Origin of Species is available in its entirety from Google Books http://t.co/bmpZFBY
    about 14 hours ago via Tweet Button · Reply · View Tweet
  6. stewartsm: The collected writings of Michael Snow – Google Books: Wilfrid Laurier Univ. Press, 1994 http://bit.ly/bwSYVa
    about 15 hours ago via twitterfeed · Reply · View Tweet
  7. atomicpoet: The ABCs of Strategic Life Planning – Google Books http://ow.ly/19ROLn
    about 23 hours ago via HootSuite · Reply · View Tweet

A few observations — of the 7 books linked, 6 of them are recent, copyrighted titles, with #5 being the only public domain library-scanned book. Of the copyrighted books, all have a preview except #3. Interestingly, and consistent with earlier observations, a significant number of these books are in non-English languages (#2, #3).

I’m presenting this little sample not so much to draw specific conclusions about the nature of tweets on GBS, but simply to draw attention to my clear observation that people ARE tweeting links to GBS books more than they were 6-12 months ago. This certainly indicates an increase in awareness of GBS, and goes along with my general impression, especially from statistics provided by Google, that it’s getting heavier use than most commentators, especially anti-GBS ones, are acknowledging.

In a companion case study of searching for a book title in Google Book Search (GBS), I reported that there were multiple editions from Google Books but no editions from Internet Archive (IA). In this article, I report on searching the same title — Diagnostic and therapeutic technic, by Albert S. Morrow — directly in Internet Archive. GBS found four editions for the book, and IA finds three, from a variety of sources.


The list below is the titles retrieved in searching for Diagnostic and therapeutic technic in IA, in the rank they appeared:

• 1. Diagnostic and  therapeutic technic, 1911
Digitizing sponsor: Google; From the collections of: Harvard University; Downloads: 43
Source: Google: GBS-Library: Harvard

• 2. Diagnostic and  therapeutic technic, 1911
Digitizing sponsor: Google; From the collections of: unknown library; Downloads: 39
Source: Google: GBS-Library: Stanford

• 3. Diagnostic and  Therapeutic Technic, 1921
Digitizing sponsor: Google; from the collections of: unknown library; Downloads: 21
Source: Google: GBS-Library: Stanford

• 4. Diagnostic and  therapeutic technic, 1915
Digitizing sponsor: Google; from the collections of: Harvard University; Downloads: 27
Source: Google: GBS-Library: Harvard

• 5. Diagnostic and therapeutic technic, 1915
From scan: “Digitized by the Internet Archive in 2010 with funding from Open Knowledge Commons”
Digitizing sponsor: Open Knowledge Commons; Contributor: Columbia University Libraries; Downloads: 2

• 6. Diagnostic and  therapeutic technic, 1915
Digitizing sponsor: MSN; Contributor: University of California Libraries; Collection: americana, CDL; Downloads: 130

• 7. Diagnostic and  therapeutic technic, 1921
Digitizing sponsor: MSN; Contributor: Gerstein – University of Toronto; Downloads: 67

Observations and conclusions

Sources of records:

  • The first four records are from Google Book Search (although it’s not all of the records for the title that are in GBS). The IA record for these includes the URL for the corresponding GBS record without linking directly, so I’ve added links to help see the connection.
  • MSN books (#6 – #7) came to IA from the Microsoft effort to scan books in competition with Google, which ended in 2008.
  • Open Knowledge Commons (#5) is the only record that’s not GBS or MSN — It’s apparently related to a new effort by OKC  to scan medical books.

As in GBS, the reasoning for the placement of the different records — different editions and different contributing libraries — is ambiguous. The only order seems to be that records from the same sources are together.

The number of downloads appears to be a good indication of how long a record has been in IA. The MSN records (#6, #7) have been in the longest and have the most downloads. The GBS records (#1 – #4) were apparently entered later, at about the same time, since they have similar download numbers. The Open Knowledge Commons record (#5) was just entered this year, and only has two downloads.

The most interesting finding in this little case study (combined with the one on Google Book Search) is the duplication of GBS records in IA. This raises the question of how the two scanning efforts relate to each other — Which books from GBS are duplicated in IA? Is IA able to scan any full-view books in GBS? Do they particularly scan books from some contributing libraries?

Related articles:

