Today, scientific publishers are production companies, specializing in services like editorial, copyediting, and, in some cases, sales and marketing. My claim is that in ten to twenty years, scientific publishers will be technology companies [3]. By this, I don’t just mean that they’ll be heavy users of technology, or employ a large IT staff. I mean they’ll be technology-driven companies in a similar way to, say, Google or Apple. That is, their foundation will be technological innovation, and most key decision-makers will be people with deep technological expertise. Those publishers that don’t become technology driven will die off.
What I will do … is draw your attention to a striking difference between today’s scientific publishing landscape, and the landscape of ten years ago. What’s new today is the flourishing of an ecosystem of startups that are experimenting with new ways of communicating research, some radically different to conventional journals. Consider Chemspider, the excellent online database of more than 20 million molecules, …. Consider Mendeley, a platform for managing, filtering and searching scientific papers, …. Or consider startups like SciVee (YouTube for scientists), the Public Library of Science, the Journal of Visualized Experiments, vibrant community sites like OpenWetWare and the Alzheimer Research Forum, and dozens more. And then there are companies like WordPress, Friendfeed, and Wikimedia, that weren’t started with science in mind, but which are increasingly helping scientists communicate their research. This flourishing ecosystem is not too dissimilar from the sudden flourishing of online news services we saw over the period 2000 to 2005.
Let’s look up close at one element of this flourishing ecosystem: the gradual rise of science blogs as a serious medium for research. It’s easy to miss the impact of blogs on research, because most science blogs focus on outreach. But more and more blogs contain high quality research content. Look at Terry Tao’s wonderful series of posts explaining one of the biggest breakthroughs in recent mathematical history, the proof of the Poincare conjecture. Or Tim Gowers recent experiment in “massively collaborative mathematics”, using open source principles to successfully attack a significant mathematical problem. Or Richard Lipton’s excellent series of posts exploring his ideas for solving a major problem in computer science, namely, finding a fast algorithm for factoring large numbers. Scientific publishers should be terrified that some of the world’s best scientists, people at or near their research peak, people whose time is at a premium, are spending hundreds of hours each year creating original research content for their blogs, content that in many cases would be difficult or impossible to publish in a conventional journal. What we’re seeing here is a spectacular expansion in the range of the blog medium. By comparison, the journals are standing still.
This flourishing ecosystem of startups is just one sign that scientific publishing is moving from being a production industry to a technology industry. A second sign of this move is that the nature of information is changing. Until the late 20th century, information was a static entity. The natural way for publishers in all media to add value was through production and distribution, and so they employed people skilled in those tasks, and in supporting tasks like sales and marketing. But the cost of distributing information has now dropped almost to zero, and production and content costs have also dropped radically [4]. At the same time, the world’s information is now rapidly being put into a single, active network, where it can wake up and come alive. The result is that the people who add the most value to information are no longer the people who do production and distribution. Instead, it’s the technology people, the programmers.
If you doubt this, look at where the profits are migrating in other media industries. In music, they’re migrating to organizations like Apple. In books, they’re migrating to organizations like Amazon, with the Kindle. In many other areas of media, they’re migrating to Google: Google is becoming the world’s largest media company. … How many scientific publishers are as knowledgeable about technology as Steve Jobs, Sergey Brin, or Larry Page?
… Being wrong is a feature, not a bug, if it helps you evolve a model that works: you start out with an idea that’s just plain wrong, but that contains the seed of a better idea. You improve it, and you’re only somewhat wrong. You improve it again, and you end up the only game in town. Unfortunately, few scientific publishers are attempting to become technology-driven in this way. The only major examples I know of are Nature Publishing Group (with Nature.com) and the Public Library of Science. …
Opportunities
So far this essay has focused on the existing scientific publishers, and it’s been rather pessimistic. But of course that pessimism is just a tiny part of an exciting story about the opportunities we have to develop new ways of structuring and communicating scientific information. These opportunities can still be grasped by scientific publishers who are willing to let go and become technology-driven, even when that threatens to extinguish their old way of doing things. … Here’s a list of services I expect to see developed over the next few years. A few of these ideas are already under development, mostly by startups, but have yet to reach the quality level needed to become ubiquitous. The list could easily be continued ad nauseum – these are just a few of the more obvious things to do.
Personalized paper recommendations: Amazon.com has had this for books since the late 1990s. You go to the site and rate your favourite books. The system identifies people with similar taste, and automatically constructs a list of recommendations for you. This is not difficult to do: Amazon has published an early variant of its algorithm, and there’s an entire ecosystem of work, much of it public, stimulated by the Neflix Prize for movie recommendations. If you look in the original Google PageRank paper, you’ll discover that the paper describes a personalized version of PageRank, which can be used to build a personalized search and recommendation system. …
A great search engine for science: ISI’s Web of Knowledge, Elsevier’s Scopus and Google Scholar are remarkable tools, but there’s still huge scope to extend and improve scientific search engines [5]. With a few exceptions, they don’t do even basic things like automatic spelling correction, good relevancy ranking of papers (preferably personalized), automated translation, or decent alerting services. They certainly don’t do more advanced things, like providing social features, or strong automated tools for data mining. Why not have a public API [6] so people can build their own applications to extract value out of the scientific literature? Imagine using techniques from machine learning to automatically identify underappreciated papers, or to identify emerging areas of study.
High-quality tools for real-time collaboration by scientists: Look at services like the collaborative editor Etherpad, which lets multiple people edit a document, in real time, through the browser. They’re even developing a feature allowing you to play back the editing process. Or the similar service from Google, Google Docs, which also offers shared spreadsheets and presentations. Look at social version control systems like Git and Github. Or visualization tools which let you track different people’s contributions. …
Scientific blogging and wiki platforms: With the exception of Nature Publishing Group, why aren’t the scientific publishers developing high-quality scientific blogging and wiki platforms? … On a related note, publishers could also help preserve some of the important work now being done on scientific blogs and wikis…. The US Library of Congress has taken the initiative in preserving law blogs. Someone needs to step up and do the same for science blogs.
The data web: Where are the services making it as simple and easy for scientists to publish data as it to publish a journal paper or start a blog? A few scientific publishers are taking steps in this direction. But it’s not enough to just dump data on the web. It needs to be organized and searchable, so people can find and use it. …