Peter Suber, at Open Access News, has a good article on Google’s recent announcement that they are now OCR’ing scanned PDF documents so that they become searchable text documents in Google Web Search.

Scroll down especially to Suber’s comments, in which he describes the background to this Google advance, which is already in Google Book Search — As he says, it’s had an OCR’d text layer version of full-view books from the start, which is how they can be searched. (Google Catalogs also has a searchable text layer).

For more on searchable and non-searchable text see: Identifying Google scanned PDF’s

Comments are closed.