Working on Swine Flu this week has been especially interesting because it makes me reflect on how much things have changed in the information landscape since I worked on SARS in 2003 and Bird Flu in 2004-05. In those outbreaks, the main source of information was lists of links found in Google. How much that has changed now, with Twitter! People use Twitter in different ways — For me the most valuable part of it is the links in tweets. In former outbreaks, when Google was the “king of links,” it was especially hard to keep up with current news stories. Now links to breaking news stories appear within minutes in Twitter.

Evgeny Morozov, in his article, Swine flu: Twitter’s power to misinform complains about the chaotic nature of Swine Flu information in Twitter:

There are quite a few reasons to be concerned about Twitter’s role in facilitating an unnecessary global panic about swine flu. … [Twitter users] armed with a platform to broadcast their fears are likely to produce only more fear, misinformation and panic. … Twitter seems to have introduced too much noise into the process … The “swine flu” Twitter-scare has … proved the importance of context — The problem with Twitter is that there is very little context you can fit into 140 characters.

Anyone who’s used Twitter knows that there’s much truth here. Especially for a new user, it’s hard to separate the Twitter wheat from the Twitter chaff. But it can be done. To show the shallow, mindless nature of Twitterers, Morozov quotes text from tweets about Swine Flu. And he’s right, they’re pretty valueless. But, clicking to look at the writers’ profile pages shows that most of them are fairly inexperienced, with relatively few updates and followers, so it’s not surprising that their tweets are bad. Which goes to show, just as with online sources in general, in Twitter it’s important to check the source! Find out who’s behind the information.

So, while I agree with Morozov that Twitter has some negatives, I think we need to appreciate the positive value it has added to our ability to exchange information rapidly, that will certainly make us better able to deal with a real pandemic if it occurs. In composing this article, I came across a good conversation in Twitter that speaks to my ideas:

@PhilHarrison: Twitter is relatively new & we’re all learning about its power to inform & misinform as well. (bold added)
@charlesyeo: During SARS, some people in Asia blamed media for not exposing cases earlier so the sick can get help!

Note that the second tweet, by charlesyeo, comes back to the point I made in the first paragraph, that lack of information was a serious problem in the SARS epidemic. Twitter has clearly improved that.

Another valuable of Twitter in the Swine Flu epidemic has been the vibrancy of its international participation — Before Swine Flu, I had learned to value the prolific and multi-lingual tweeting of Jose Afonso Furtado (@jafurtado), a librarian in Portugal who tweets mostly on library/publishing subjects. When the Swine Flu epidemic broke out seriously in Mexico, he tweeted on that, and through his tweets I was able to connect on Twitter with people in Europe and Latin America who were following the situation in Mexico.

Eric Rumsey is at @ericrumsey

Interesting thought by Mike Shatzkin on the unlikeliness of pictures in eBooks anytime soon (bold added):

The proliferation of formats, devices, screen sizes, and delivery channels means that the idea of “output one epub file and let the intermediaries take it from there” is an unworkable strategy. [Here's one reason why:] … Epub can “reflow” text, making adjustments for screen size. But there is no way to do for that for illustrations or many charts or graphs without human intervention (for a long while, at least.) Even if you could program so that art would automatically resize for the screen size, you wouldn’t know whether the art would look any good or be legible in the different size. A human would have to look and be sure.

Mike is talking here about the issue I wrote about in the foundational article for Seeing the Picture — Pictures are in many ways an intractable problem for automation — In many situations, the best use of pictures requires intelligent human input.

Last week the National Academy of Sciences announced that “more than 9,000 Academies reports” are now available through Google Book Search, upon completion of “the first phase of a partnership with Google to digitize the library’s collection of reports from 1863 to 1997.” This sounds like good news, but it’s hard to evaluate the exact nature of the NAS documents that have become available, since neither the NAS press release nor Google give any indication of how to search the newly available documents in Google Book Search.

The NAS press release uses the word “reports” to describe the newly available documents. In its long history, the NAS has had several named series (below), and one of those is in fact “Report of the NAS.” But the example documents in the NAS press release are not part of that (or any other) series, so apparently the use of the word “report” in the press release is meant more as a generic description of the documents.

As far as I can tell, the only way to find NAS documents in Google Book Search is to search for “national academy of sciences” — This retrieves a mix of monographic sorts of titles, and series titles. Some have apparently been digitized and NAS, and others have been digitized at participating libraries. Below, I’m listing the main NAS series I find, that are in full-view, freely-available mode.

In Dec, 2008, Google announced that they had begun adding recent popular magazines to Google Book Search. Because Google, inexplicably, chose not to provide a list of titles that were included, I made a list of about 40 titles, and until recently I hadn’t added to it, assuming that Google hadn’t added any more titles, since none had appeared on the Google Book Search home page. Recently, though, I saw in Twitter that people were mentioning new titles, so I did some searching to see if I could find more. And indeed, I did find about 10 new titles that have apparently been added recently, and I’ve added these to the list at Google Magazines – Titles.

A suggestion: If you find an interesting new magazine title in Google Book Search, put it in Twitter, and include the hashtag that I just created, #gbsmag (Clicking this will retrieve tweets in Twitter Search, with examples from the new titles I recently found). If you don’t use Twitter, of course, feel free to put new magazine titles in a comment to this article.

Since the announcement by Apple last week of new iPhone OS software that will become available in June, publishers Adam Hodgkin and Mike Shatzkin have been having an interesting dialog about the future of the eBook market, and how iPhone 3.0 will affect the competition between Amazon, Apple, and Google. Most of my posting here will be a presentation of the views of Hodgkin and Shatzkin on the eBook market, but I think an article by Ben Parr, at mashable.com, on more general effects of iPhone 3.0, does a good job of setting the stage for the discussion of eBooks. In his discussion of the new ability to purchase items within an application, Parr seems to be talking about the same thing that Hodgkin sees as being so revolutionary about the new iPhone OS (Correct me if I’m wrong on this, Adam). So, first — an excerpt from Parr’s posting:

The new iPhone 3.0 software includes the ability to copy-and-paste, a landscape keyboard, and push notifications. However, none of these updates are as revolutionary as the new features Apple offers to iPhone application developers. The one to watch [especially] is the ability to purchase items within an application. This is a feature that matters because of the vast opportunities that it presents to both developers and users. … If the iPhone application store revolutionized the mobile as a platform, then the iPhone 3.0 OS may very well be the spark that revolutionizes the mobile as its own economy. [boldface here & below added]

With the new iPhone OS, Hodgkin thinks that Apple has put themselves into a leading position in their competition with Amazon and Google for the eBook market:

The announcement earlier this week about Apple’s iPhone OS 3.0 made it at last pretty clear how Apple is going to become a player and the strategy is so simple and solid that I am surprised that more of us did not see it coming. Apple has taken the very sensible position that it doesn’t need to be a big player in the digital books or the ebooks market to win the game hands down. Apple is going to let authors, publishers and developers get on with their business and work out how the digital books market is going to work and Apple is just going to collect the market-maker’s fee for letting it happen, on and in the iPhone arena. … The position that Apple have announced for themselves is stylish, decisive and agnostic. Apple doesn’t mind whether books are based in the cloud as web resources, or shipped around the internet as book-specific file formats. Web-based books, digital editions and ebook file formats can all run easily on the iPhone if that is what is needed: “Open house, come over here and play”. That is the message from Cupertino.

Shatzkin, however, thinks that Hodgkin has jumped too quickly for Apple, and he says that the competition is still wide open:

Hodgkin sees brilliance in Apple’s move not to enter the proprietary ebook wars, but simply to be a facilitator of sales to iPhone users … [But his article] took no note of Sony, Stanza, or the potential impact of broadly-distributed epub files. … It also took no note of Barnes & Noble’s recent purchase of Fictionwise or the fact that Waterstone’s has teamed with Sony Reader for distribution in the UK. … I think, most of all, this analysis omits full consideration of the discrete functions served by the retailer in the supply chain. … Apple is not providing the full suite of retail services. … It isn’t just too early to predict a winner; it is too early to declare the finalists.

Hodgkin posts a reply on his blog to Shatzkin:

Shatzkin has not understood what Apple are doing with the strategy announced for the iPhone 3.0 SDK. They are tackling the retail environment head on and building the retail functions. Shatzkin thinks that Apple will fail the retail test. Did Mike view the video presentation with which Apple gave a preview of 3.0 SDK? Consider that the very first item that Scott Forstall discusses (before even ‘cut and paste’!) is the way that they have enhanced the App Store. Note that its a store. A place where consumers shop. It is a retail store which enables developer creativity and it will support discovery of books, magazines, games etc, browsing and sampling, search, metadata, price choice and traditional bookstore price anarchy, and after sales support (though some fulfillment and much support will fall to developers and publishers). Most striking is the near total freedom that publishers are given on pricing (99c — $999). … It is surprising that anyone would think that Apple who have made such a considerable success of Apple stores and online retail selling will find themselves out of their depth with digital books. Nobody would say that building a retail system for digital books is going to be easy, but Apple clearly are a good candidate to do it. Especially now that they have announced this co-optive strategy.

A couple of recent commentaries, excerpted below, suggest that the best sort of books for eBooks are ones that are intended to be read linearly, navigating through pages consecutively (i.e. most notably fiction). Both observers say that books whose usability is increased by flipping back and forth from one section to another do not make good eBooks.

Writing about the Kindle, Jakob Nielsen notes the problem with non-linear content:

The usability problem with non-linear content is crucial because it indicates a deeper issue: Kindle’s user experience is dominated by the book metaphor. The idea that you’d want to start on a section’s first page makes sense for a book because most are based on linear exposition. Unfortunately, this is untrue for many other content collections, including newspapers, magazines, and even some non-fiction books such as travel guides, encyclopedias, and cookbooks … The design decisions that make Kindle good for reading novels (and linear non-fiction) make it a bad device for reading non-linear content.

Later in the review, Nielsen broadens his comments to eBooks more generally. In addition to the issue of linearity, he also mentions that books that depend on pictures are problematic:

11 years ago, I wrote that electronic books were a bad idea. Has Kindle 2 changed my mind?– Yes — I now think there’s some benefit to having an information appliance that’s specialized for reading fiction and linear non-fiction books that don’t depend on illustrations and don’t require readers to refer back and forth between sections.

Paul Biba, in comments on using a cookbook on the Kindle, says:

The concept doesn’t work. This is not the Kindle’s fault, but the fact that some things are just not meant for an ebook format. When using a cookbook one likes to flip through it browsing for recipes. You look at one, go back and compare it to another … see if you can’t combine the ingredients of [recipes] … You simply can’t do this flipping back and forth with an ebook … Going back and forth from the table of contents to the index is a time-consuming process. The ergonomics of the whole thing is just not set up for cooking and recipe browsing.

This is really the first time I have come across a complete failure of the ebook medium. I can’t see how it is possible to make any change in the hardware that would alleviate the problem. There is simply no substitute for flipping pages and marking them with bookmarks … The ebook format is, by its nature, linear and this linearity is not adaptable to serious cooking.

In a recent posting at O’Reilly Radar, Linda Stone discusses recent comments by Brewster Kahle and Robert Darnton on the Google Book Search Settlement. This is especially valuable for its talk about the orphan books problem, discussed by Kahle, as Stone reports, and in comments by Thomas Lord and Tim O’Reilly. I’m excerpting this interchange here. About Kahle’s posting, Stone says that he “focused on the plight of ‘orphan works’ – that vast number of books that are still under copyright but whose authors can no longer be found.”

Thomas Lord’s first comment — He says he’s thought much about the settlement:

My conclusion [around the time of the settlement] was that the big libraries, like Harvard, had made a bad deal — they didn’t understand the tech well enough and Google basically not only steamrollered them but implicated them in the potentially massive infringement case.

Basically, Google should have, indeed, paid for scanning and building the databases – but the ownership of those databases should have remained entirely with the libraries … The Writer’s Guild caved pretty easy and pretty early but legal pressure can still be brought to bear on Google. They can give up their private databases back to the libraries that properly should own them in the first place.

Tim O’Reilly’s comment on the article, and especially on Lord’s comment:

I agree with Tom’s analysis. (See my old post: Book search should work like web search [2006]). And I do agree with Brewster’s concern that this settlement will derail the kind of reform that would have solved this problem far more effectively. That’s still my preferred solution.

That being said, the tone of both Brewster’s comments and Darnton’s, implies that Google was up to some kind of skulduggery here. That’s unfair. Should they have stood up on principle to the Author’s Guild and the AAP? Absolutely, yes. But it’s the AG and the AAP who should be singled out for censure. … From conversations with people at Google, I believe that they do in fact continue to believe in real solutions to the orphaned works problem, and that demonizing them doesn’t do any of us any good.

The fact is, that Google made a massive investment to digitize these books in the first place. No one else was making the effort … In short, we’re comparing a flawed real world outcome with an “if wishes were horses” outcome that wasn’t in the cards. … Barring change to copyright law (and yes, we need that), Google has at least created digital copies of millions of books that were not otherwise available at all. Make those useful enough and valuable enough, and I guarantee there will be pressure to change the law so that others can profit too. …

Google Book Search was an important step forward in building an ebook ecosystem. I wish this settlement hadn’t happened, and that Google had held out for the win on the idea that search is fair use. And I wish that Google had taken the road that Tom outlined. … But they put hundreds of millions of dollars into a project that no one else wanted to touch. And frankly, I think we’re better off, even with this flawed settlement, than if Google had never done this in the first place.

Finally, I’ll point out that there is more competition in ebooks today than at any time in the past. Any claim that we’re on the verge of a huge Google monopoly, such as Darnton claims, is so far from the truth as to be laughable. Google is one of many contenders in an exploding marketplace.

Thomas Lord’s reply to O’Reilly:

… In the spirit of understanding things: you praise Google, I don’t. We’re better off those books having been scanned (I strongly agree) – I don’t like the way they bull-in-china-shop worked this. I think there’s a deep and lasting threat here that they need to fix if they want to “not be evil.”

Google CEO Eric Schmidt’s comments on health/medicine in a recent wide-ranging interview by Charlie Rose have not gotten much attention, so I’m excerpting them here. First, Schmidt discusses Google Flu Trends:

[For clarity I've mixed a few words from Rose's questions with Schmidt's comments]
There are many [positive] things that we can do with the corpus of information that’s being gathered … The most interesting thing we’ve recently done is called flu trends. We looked at … trends in terms of worldwide flu … There’s a lot of evidence, concern about a pandemic … that might occur, similar to the 1918 bird flu epidemic that killed … 50 million … a proportionately huge number if it were today. And because people, when they have a problem, search for something, we can detect uncommon searches as a blip. We can note that. In our case, we built a system which took anonymized searches so you couldn’t figure out exactly who it was, and that’s important. And we get six months ahead of the other reporting mechanisms so we could identify the outbreak. Many people believe that this device can save 10, 20, 30,000 lives every year just because the healthcare providers could get earlier and contain the outbreak. It’s an example of collective intelligence of which will are [sic] many, many more.

Later in the interview, Schmidt talks about what he calls a “public corpus of medical information”:

The Wikipedia model has been so successful. Why don’t we have all the smartest doctors organize a corpus, a public corpus of medical information … that combines everything everybody knows about medical practice in one place, a place where you can — again, this would have to be a public database where you keep pouring more experiential data, and then you can build computer systems … [Rose: So you have all your cases, everything you ever knew] Schmidt: Again, anonymized so it’s appropriately legal and all of that, and get it in one place so that people can begin to mine the data. They can actually begin to figure out what the disease trends are. What are the real health trends? And this is not a knock on the existing providers to do it. They just don’t have the scale. We are strong when we have thousands of people working in parallel to solve a really important problem. I would tell you, by the way, that if you look at the problems that society has hit over the last thousand years, start with the plague, right all of the things that really hit us that nearly destroyed society, we overcame them through technology and innovation. People figured out new ways whether it was in medicine or governance to overcome them. So let’s be positive about it. We can work those issues. There’s always a way to handle the objections if it’s important.

Jon Orwant, from Google Book Search, made a presentation at the O’Reilly Tools Of Change (TOC) for Publishing Conference in New York last week, which I did not attend. Apparently Orwant presented some numeric data about the use of Google Books, but the data has yet to be spread to the world (See my comment on Peter Brantley’s blog about this). I’ve been searching in the week since TOC, to see what discussion there is of Orwant’s talk, and have found little. So I’m excerpting the three pieces that I have found. Only the first has any numeric data at all.

First, a piece by Jackie Fry, on the BookNet Canada publishers’ Blog. This is notable, and I’m putting it first, because it’s the only report I’ve found that has any numeric data at all from Orwant’s talk:

Conversion rates from Google Book Search results have been great for their partner publishers, mostly in the Textbook, Reference and STM channels, particularly in the shallow backlist (2003-2005 pubdates) with the highest Buy the Book clickthrus on 2004 titles. For some publishers, conversion to buy is as high as 89% for the titles they have made available.

30% of viewers looked at 10 or more pages when viewing the content of the book to make a buy decision.

The future is analytics! Google is thinking about what data they can pull out of their logs and provide anonymous aggregate data around consumer behaviour like what books were purchased that were like this one, search terms used most often for a category, most effective discounts, most effective referral sites etc.

More research [is needed] – Saw some good presentations with quantifiable research included – Brian O’Leary from Magellan, Joe Orwent (sic) from Google, and Neelan Choksi from Lexcycle were some of the few presenters who were able to quantify in any way what is going on in the marketplace. We need more  …

James Long’s report, on thedigitalist.net (Pan Macmillan Publishing):

Jon Orwant, from Google Book Search, stated at TOC that ‘the ultimate goal of Google Book Search is to convert images to “original intent” XML’. He explained the post-processing Google runs to continuously improve the quality of the scanned books, and to convert images to structured content. Retro-injecting structure accurately is no mean feat but when it’s done, Google will be able to transform the books into a variety of formats. The content becomes mutable and transportable, in a sense it isn’t yet, even though it is scanned, online and searchable. Orwant also presented three case studies – McGraw Hill, OUP, Springer – that demonstrated the benefits publishers can gain from having their books in GBS.

Highlighting the theme of discovery (to my mind), Tim O’Reilly interjected, at the end of these case studies, and made the point that O’Reilly used to own the top links to their own books in Google search results, but have now lost those links to GBS. Orwant, somewhat simplistically, responded that O’Reilly needed to improve their website to regain the top ranked link per title, as this spot was determined by Google’s search algorithms. This was not a convincing response, and dodged the issue, which I understood to be that the scale and in-house-ness of GBS could seriously inhibit the ability of the publisher to represent their own products online at the most common point of entry by the consumer, Google search results. There are many compelling reasons for publishers to own the top search result link, the most obvious being: offer unique additional content around the title, start a conversation with the reader, control the brand.

Edward Champion’s comments on his blog:

Thanks to a concept called blending, Google Book Search options remain in the top search results. An effort to direct traffic GBS’s way. …

There are 1.5 million free books, all public domain titles, available on Google. But if you want to access them, well, you’ll have to go to Google. Or you’ll have to have Google generate results at your site. Because the Google team are specialists in latency. They can do things with milliseconds that you couldn’t work out in your dreams.

As an experiment, Google recently unleashed Google Books Mobile, which means that you can nose search Google Book Search from your smartphone … Orwant was careful to point out that Google is not in the handset manufacturing or carrier business. But he anticipated, just as many of the seer-like speakers at Tools of Change did based on sketchy inside information, a “rapid evolution.”

Tim O’Reilly tried to badger Orwant too. You see, O’Reilly used to have good webpage placement for many of his titles. But those days are gone, replaced by Google Book Search results above the O’Reilly pages. And that hardly seems fair …

There’s some comfort in knowing that 99% of the books at GBS have been viewed at least once. Even the sleep-inducing textbooks. Which is really quite something. Which brings us to the future, which is based on the past …

That snippet view you see with some titles? Orwant‘s official position, pressed by Cory Doctorow, is that it’s fair use. But once the October 2008 settlement in Authors Guild v. Google is approved by the court, you’re going to see that snippet view jump to 20% of the book.

Please comment here or Twitter @ericrumsey

There’s been a lot of buzz about the announcement last week of mobile access to Google Book Search public-domain books. I’ve been looking hard for nitty-gritty details of how it works, though, and haven’t found much. The best is in comments by bowerbird on an announcement article on toc.oreilly.com. It’s easy for comments to get lost, so I’m excerpting most of bowerbird’s words here:

this offering is very good. extremely good. the interface is quite nice…

it was great to see google is serving digital text, rather than scans, since text is a lot more nimble. however, a tap on a paragraph brings up the scan of that paragraph, which is nice. and another tap restores the text. so if you want to verify the o.c.r., it’s simple to do. as i said above, this is nicely done.

curiously, in the one book i checked (roughing it), the text was extremely accurate as well, which is a pleasant discovery. i found only one o.c.r. error — “firty” for “fifty”, due to a blotch on the page …

this quality text is _not_ typical of google’s raw o.c.r., so they’ve evidently run some clean-up routines on it. i’m curious to see if they share this cleaned-up text with their library partners, or keep it to themselves… (no, the libraries weren’t smart enough to ask for it, as far as i know, let alone write it into the contracts.)

I’ve bolded what I take to be the most interesting point here, that Google has done an extra-special job of OCR’ing text for GBS mobile. As bowerbird notes, hopefully Google will share more about this process, sooner or later.