The screenshots below are from Google Books, showing the link to the Google eBook version in the blue box to the left, and the formats available for downloading, in the upper right. The “Settings” box in the center is pasted from the Google eBook record, to show the connection between download formats and the versions available in Google eBooks.

In the first example, both PDF and ePub formats available for download in Google Books.  Correspondingly, in Google eBooks, Flowing text and Scanned Pages are available.

In the second example, only PDF format available for download in Google Books, and in Google eBooks, only Scanned Pages are available. Note that this is indicated in the blue box in Google Books with the note that the Google eBooks version is “Better for larger screens” (circled in red) – i.e. the PDF version is not good for mobile devices.

Eric Rumsey is at: eric-rumseytemp AttSign uiowa dott edu and on Twitter @ericrumseytemp

Peter Suber, at Open Access News, has a good article on Google’s recent announcement that they are now OCR’ing scanned PDF documents so that they become searchable text documents in Google Web Search.

Scroll down especially to Suber’s comments, in which he describes the background to this Google advance, which is already in Google Book Search — As he says, it’s had an OCR’d text layer version of full-view books from the start, which is how they can be searched. (Google Catalogs also has a searchable text layer).

For more on searchable and non-searchable text see: Identifying Google scanned PDF’s

Google recently announced that scanned PDF documents are now available in Google Web Search. PDF documents have been in Google before, but most PDF documents that have been scanned from paper documents have not, so this will greatly improve access to PDF’s. As described below, it’s important to be able to distinguish scanned PDF’s from others, of the sort that have been in Google before.

Scanned PDF documents are originally created by making an image scan of a paper document, and since the text is an image, it’s not selectable or searchable as text. The other kind of PDF document, usually called native PDF, that’s been in Google before, is originally created from an existing electronic formatted document, like a Word document, and its text is selectable and searchable as text.

From Google search results it’s not possible to determine  whether a PDF document is a scanned document or a native document — Both simply say “File Format: PDF/Adobe Acrobat.” To see if it’s scanned or native PDF, go to the document and click on a word to see if it can be selected. If it can, it’s native PDF; if not it’s scanned PDF. It’s important to know this because in a scanned PDF, the text is not searchable within the PDF-browser reader. This is not readily apparent, because the search command seems to work, but comes up with zero results. To search the text of a scanned document, go to search results, and click “View as HTML,” which has the text of the document.

Examples from Google:
Google search : Scanned PDF – Text cannot be selected (Notice that the text in this document is scratchy, poor quality, another indication of scanned text).
Google search : Native PDF – Text can be selected

See also: Google Books and Scanned PDF’s

For more:

Kalev Leetaru (Univ Illinois) recently published a lengthy and interesting article comparing Google Books and the Open Content Alliance. It’s especially interesting because it brings together a good description of many nitty-gritty details of Google Books that are not easy to track down. I’m excerpting a few passages on the use of color and PDF format in Google Books.

Color in Google Books – I have the impression, as Leetaru says, that when Google first started scanning books they didn’t scan in color — They do now though, at least in some cases.

[I’ve added the bold-face in quotes below. The order of quotes is not necessarily the same as in Leetaru’s article.]

Since the majority of out–of–copyright books do not have color photographs or other substantial color information, Google decided early on that it would be acceptable to trade color information for spatial resolution.

Google’s use of bitonal imagery and its interactive online viewing client significantly decrease the computing resources required to view its material. … Google Book’s bitonal page images, on the other hand, render nearly instantly, permitting realtime interactive exploration of works.

Use of PDF in Google Books – It’s interesting that Leetaru says the Google Books view “mimics the PDF Acrobat viewer.” Until recently, I avoided using the “Download PDF” button link in Google Books, thinking that it was mainly for downloading to print, and that the PDF view would take a long time to load. But I’m finding that it loads quickly, and provides a fairly usable interface that is in fact reminiscent of the Google Books view, as Leetaru suggests.

Google realized it was necessary to use different compression algorithms for text and image regions and package them in some sort of container file format that would allow them to be combined and layered appropriately. It quickly settled on the PDF format for its flexibility, near ubiquitous support, and its adherence to accepted compression standards (JBIG2, JPEG2000).

While many digital library systems either do not permit online viewing of digitized works, or force the user to view the book a single page at a time (called flipbook viewing), Google has developed an innovative online viewing application. Designed to work entirely within the Web browser, the Google viewing interface mimics the experience of viewing an Adobe Acrobat PDF file.

While most services take advantage of the linearized PDF format, Google made a conscious decision to avoid it. Linearized PDFs use a special data layout to allow the first page of the file to be loaded immediately for viewing … Google found several shortcomings with this format [noting that] the majority of PDF downloads are from users wanting to view the entire work offline or print it [and that] for these users, linearized PDFs provide no benefit.

See Leetaru’s extensively-referenced article for many other useful details.