This is excerpts from part 2 of Michael Nielsen’s seminal and long article, Is scientific publishing about to be disrupted?. Part 1 of Nielsen’s article is a general consideration of how industries fail, with particular discussion of the newspaper industry and blogs. Part 2 is the heart of Nielsen’s case (and has the same title as the article), so I’m excerpting it here to bring it to more certain attention …

Today, scientific publishers are production companies, specializing in services like editorial, copyediting, and, in some cases, sales and marketing. My claim is that in ten to twenty years, scientific publishers will be technology companies [3]. By this, I don’t just mean that they’ll be heavy users of technology, or employ a large IT staff. I mean they’ll be technology-driven companies in a similar way to, say, Google or Apple. That is, their foundation will be technological innovation, and most key decision-makers will be people with deep technological expertise. Those publishers that don’t become technology driven will die off.

What I will do … is draw your attention to a striking difference between today’s scientific publishing landscape, and the landscape of ten years ago. What’s new today is the flourishing of an ecosystem of startups that are experimenting with new ways of communicating research, some radically different to conventional journals. Consider Chemspider, the excellent online database of more than 20 million molecules, …. Consider Mendeley, a platform for managing, filtering and searching scientific papers, …. Or consider startups like SciVee (YouTube for scientists), the Public Library of Science, the Journal of Visualized Experiments, vibrant community sites like OpenWetWare and the Alzheimer Research Forum, and dozens more. And then there are companies like WordPress, Friendfeed, and Wikimedia, that weren’t started with science in mind, but which are increasingly helping scientists communicate their research. This flourishing ecosystem is not too dissimilar from the sudden flourishing of online news services we saw over the period 2000 to 2005.

Let’s look up close at one element of this flourishing ecosystem: the gradual rise of science blogs as a serious medium for research. It’s easy to miss the impact of blogs on research, because most science blogs focus on outreach. But more and more blogs contain high quality research content. Look at Terry Tao’s wonderful series of posts explaining one of the biggest breakthroughs in recent mathematical history, the proof of the Poincare conjecture. Or Tim Gowers recent experiment in “massively collaborative mathematics”, using open source principles to successfully attack a significant mathematical problem. Or Richard Lipton’s excellent series of posts exploring his ideas for solving a major problem in computer science, namely, finding a fast algorithm for factoring large numbers. Scientific publishers should be terrified that some of the world’s best scientists, people at or near their research peak, people whose time is at a premium, are spending hundreds of hours each year creating original research content for their blogs, content that in many cases would be difficult or impossible to publish in a conventional journal. What we’re seeing here is a spectacular expansion in the range of the blog medium. By comparison, the journals are standing still.

This flourishing ecosystem of startups is just one sign that scientific publishing is moving from being a production industry to a technology industry. A second sign of this move is that the nature of information is changing. Until the late 20th century, information was a static entity. The natural way for publishers in all media to add value was through production and distribution, and so they employed people skilled in those tasks, and in supporting tasks like sales and marketing. But the cost of distributing information has now dropped almost to zero, and production and content costs have also dropped radically [4]. At the same time, the world’s information is now rapidly being put into a single, active network, where it can wake up and come alive. The result is that the people who add the most value to information are no longer the people who do production and distribution. Instead, it’s the technology people, the programmers.

If you doubt this, look at where the profits are migrating in other media industries. In music, they’re migrating to organizations like Apple. In books, they’re migrating to organizations like Amazon, with the Kindle. In many other areas of media, they’re migrating to Google: Google is becoming the world’s largest media company. … How many scientific publishers are as knowledgeable about technology as Steve Jobs, Sergey Brin, or Larry Page?

… Being wrong is a feature, not a bug, if it helps you evolve a model that works: you start out with an idea that’s just plain wrong, but that contains the seed of a better idea. You improve it, and you’re only somewhat wrong. You improve it again, and you end up the only game in town. Unfortunately, few scientific publishers are attempting to become technology-driven in this way. The only major examples I know of are Nature Publishing Group (with Nature.com) and the Public Library of Science. …

Opportunities

So far this essay has focused on the existing scientific publishers, and it’s been rather pessimistic. But of course that pessimism is just a tiny part of an exciting story about the opportunities we have to develop new ways of structuring and communicating scientific information. These opportunities can still be grasped by scientific publishers who are willing to let go and become technology-driven, even when that threatens to extinguish their old way of doing things. … Here’s a list of services I expect to see developed over the next few years. A few of these ideas are already under development, mostly by startups, but have yet to reach the quality level needed to become ubiquitous. The list could easily be continued ad nauseum – these are just a few of the more obvious things to do.

Personalized paper recommendations: Amazon.com has had this for books since the late 1990s. You go to the site and rate your favourite books. The system identifies people with similar taste, and automatically constructs a list of recommendations for you. This is not difficult to do: Amazon has published an early variant of its algorithm, and there’s an entire ecosystem of work, much of it public, stimulated by the Neflix Prize for movie recommendations. If you look in the original Google PageRank paper, you’ll discover that the paper describes a personalized version of PageRank, which can be used to build a personalized search and recommendation system. …

A great search engine for science: ISI’s Web of Knowledge, Elsevier’s Scopus and Google Scholar are remarkable tools, but there’s still huge scope to extend and improve scientific search engines [5]. With a few exceptions, they don’t do even basic things like automatic spelling correction, good relevancy ranking of papers (preferably personalized), automated translation, or decent alerting services. They certainly don’t do more advanced things, like providing social features, or strong automated tools for data mining. Why not have a public API [6] so people can build their own applications to extract value out of the scientific literature? Imagine using techniques from machine learning to automatically identify underappreciated papers, or to identify emerging areas of study.

High-quality tools for real-time collaboration by scientists: Look at services like the collaborative editor Etherpad, which lets multiple people edit a document, in real time, through the browser. They’re even developing a feature allowing you to play back the editing process. Or the similar service from Google, Google Docs, which also offers shared spreadsheets and presentations. Look at social version control systems like Git and Github. Or visualization tools which let you track different people’s contributions. …

Scientific blogging and wiki platforms: With the exception of Nature Publishing Group, why aren’t the scientific publishers developing high-quality scientific blogging and wiki platforms? … On a related note, publishers could also help preserve some of the important work now being done on scientific blogs and wikis…. The US Library of Congress has taken the initiative in preserving law blogs. Someone needs to step up and do the same for science blogs.

The data web: Where are the services making it as simple and easy for scientists to publish data as it to publish a journal paper or start a blog? A few scientific publishers are taking steps in this direction. But it’s not enough to just dump data on the web. It needs to be organized and searchable, so people can find and use it. …

DjVu

A month ago, Google announced that it has begun putting magazines in Google Books. In one way, this is a new direction for Google. But looked at broadly, it’s really not so new — Google has been putting old journals in Google Books for a long time. The basic difference between the newly announced “Google magazines” and Google’s “old journals,” of course, is the date of publication — The titles that are being treated as “magazines” are generally published in the last 50 years or so. But some of these also include much older issues, in some cases, such as Popular Science, going back to the 1800′s. A bit of digging — searching for words in an article — finds a nice case of a title that’s in Google Books both ways, as a magazine and as an old journal. Snippets from the “About this book” and “About this magazine” pages below show differences.

Old journals – The journal / book format

Old journals are given the same treatment as books, with each volume of the journal being considered a book. The record here is for volume 26 of Popular Science Monthly (the old name of Popular Science).

Old journals are scanned into Google Books by libraries, in the case shown here, Harvard University. As with other books scanned by libraries, the About page has a selection of thumbnail images, giving an idea of what sort of graphics are in the book. Also note the button to Download the entire volume in PDF format.

The Magazine format

In contrast to journal/book format, in which the volume (made up of several issues) is treated as the basic record unit, in magazines, the basic record unit is the issue. This record is for the Feb 1885 issue of Popular Science.

Comparing this with the journal/book format, this lacks thumbnail preview images and it also does not support downloading a PDF of the issue. It does, however, have the great advantage over the journal/book format, that all issues are connected in the Browse all issues menu.

DjVu Google Books is full of surprises!  In surveying medical journals in Google Books, I discovered that volumes of British Medical Journal circa 1880 scanned at Harvard have extensive sections devoted to advertisements. Most libraries, when they bind issues of journals and magazines into bound volumes, very reasonably remove pages that have only advertisements, to save space on the shelf. So it’s good to have a Harvard, that can afford to save the rare gems of 19th century ads, so that they can be put online for the world to enjoy!

As fanciful at the ad shown here is (“Ask for Cadbury’s Pure Cocoa, makers to the Queen”), there is a wealth of more prosaic ads in the same volume, awaiting future medical historians, on subjects such as malted infant food, lactopeptine for indigestion, bronchitis & croup kettles, and state-of-the-art wheelchairs.

I found several other journals in Google Books from the same late-19th-century era, that also have extensive ads. But British Medical Journal is the only one I found that has entire, separate volumes of advertising. Apparently there must have been separate supplements that were only ads (this was in the dawn of the age of mass advertising, and people, even including physicians, were actually GLAD to read ads!)

So, how searchable are the ads in Google Books? I tried a few examples and had mixed results — Searching for this phrase that’s in the Cadbury’s ad — “why does my doctor recommend Cadbury’s Cocoa” — was successful. But searching for a phrase in the ad that follows the Cadbury’s ad, for Anodyne Amyl Colloid — “in cases of neuralgia, sciatica, lumbago” — found the phrase in other ads for the same product, but not the one occurring in this instance.

Here are volumes of British Medical Journal that I found that are exclusively advertising (All of these were scanned at Harvard):

The list presented here has FULL VIEW (public domain, pre-1923) journals in Google Books. This is certainly NOT intended to be a complete list! There’s no easy way that I have found to limit a search in Google Books to journals, so I have found these titles by searching for appropriate words such as medical, dermatology, journal, archive, transactions. I have not included titles that have less than 5 volumes in Google Books. Unfortunately, there’s no way that I have found to sort the title searches chronologically, so to find a particular volume, it’s necessary to go through the results list. Each entry in the list below has links to the first and last volumes that I have found for each title; these dates are not necessarily inclusive. For “contributing libraries,” examples are given if there is more than one contributing library.

This list grew a lot longer than I thought it would — I was surprised to find so many journals in Google Books! It was a tedious job compiling this, and I probably won’t try to keep it current, with new volumes being added all the time. If I get feedback :-) I’m more likely to put in more work on it, so please add a comment, or mail me at: eric[hyphen]rumsey AT uiowa[dot]edu