Kalev Leetaru (Univ Illinois) recently published a lengthy and interesting article comparing Google Books and the Open Content Alliance. It’s especially interesting because it brings together a good description of many nitty-gritty details of Google Books that are not easy to track down. I’m excerpting a few passages on the use of color and PDF format in Google Books.
Color in Google Books – I have the impression, as Leetaru says, that when Google first started scanning books they didn’t scan in color — They do now though, at least in some cases.
[I’ve added the bold-face in quotes below. The order of quotes is not necessarily the same as in Leetaru’s article.]
Since the majority of out–of–copyright books do not have color photographs or other substantial color information, Google decided early on that it would be acceptable to trade color information for spatial resolution.
Google’s use of bitonal imagery and its interactive online viewing client significantly decrease the computing resources required to view its material. … Google Book’s bitonal page images, on the other hand, render nearly instantly, permitting realtime interactive exploration of works.
Use of PDF in Google Books – It’s interesting that Leetaru says the Google Books view “mimics the PDF Acrobat viewer.” Until recently, I avoided using the “Download PDF” button link in Google Books, thinking that it was mainly for downloading to print, and that the PDF view would take a long time to load. But I’m finding that it loads quickly, and provides a fairly usable interface that is in fact reminiscent of the Google Books view, as Leetaru suggests.
Google realized it was necessary to use different compression algorithms for text and image regions and package them in some sort of container file format that would allow them to be combined and layered appropriately. It quickly settled on the PDF format for its flexibility, near ubiquitous support, and its adherence to accepted compression standards (JBIG2, JPEG2000).
While many digital library systems either do not permit online viewing of digitized works, or force the user to view the book a single page at a time (called flipbook viewing), Google has developed an innovative online viewing application. Designed to work entirely within the Web browser, the Google viewing interface mimics the experience of viewing an Adobe Acrobat PDF file.
While most services take advantage of the linearized PDF format, Google made a conscious decision to avoid it. Linearized PDFs use a special data layout to allow the first page of the file to be loaded immediately for viewing … Google found several shortcomings with this format [noting that] the majority of PDF downloads are from users wanting to view the entire work offline or print it [and that] for these users, linearized PDFs provide no benefit.
See Leetaru’s extensively-referenced article for many other useful details.
Presumably Google books now photographs book pages in color then renders them into binary during the PDF compression process. This usually works well enough for text, but it often trashes the plates, especially the hand colored plates in 19th century natural history works, rendering them into useless black and white blobs. Does Google provide any way of accessing the original photographs from which the PDFs are made?