I recently fell into a nice little example of how tweets “accumulate wisdom” as they get retweeted — Starting with a simple “link to a good site” sort of tweet, then someone finds an especially good specific page down inside the good site and retweets that, then the next retweeter sees an interesting angle on that page and adds a hashtag for it … The ball just rolls along … Until it eventually leads to a series of good comments on my blog … all because of that simple little tweet that started the ball rolling …

The ball started rolling when I saw this tweet, that links to the Genetic Science Learning Center home page at Univ of Utah …

ettagirl: Learn.Genetics | Univ of Utah site about genetics, bioscience and health http://bit.ly/3fLrZu
I found a cool specific page at the Utah site that I thought would be more likely to draw interest than just linking to the home page …

ericrumsey: Cell Size & Scale – Move Slider – WOW! (Univ Utah, via @ettagirl) – http://bit.ly/YwzA8
Hugo Buriel (@BurielWebwerx) found my tweet, and in retweeting it, he made the perceptive connection to Seadragon (see my words about it below), which I hadn’t thought of  …

BurielWebwerx: RT @ericrumsey Cell Size & Scale – Move Slider – WOW! (@ettagirl) – http://bit.ly/YwzA8 (expand) <– time for some #MooTools/#Seadragon
The Slider tool at Utah does indeed look like pages viewed with Seadragon, an innovative Microsoft technology for seamless zooming. I became interested in Seadragon a year ago, and even wrote a posting about it, so I wrote a tweet to link to that posting …

This was (I assume) seen by Graham Storrs (@graywave) on Twitter, and so he sent in his useful comments to the blog article.

Eric Rumsey is at: eric-rumsey AttSign uiowa dott edu and on Twitter @ericrumsey

John C. Abell, in his recent Wired article Steve Jobs’ Legacy Is the Missing Clue to the Apple Tablet, suggests that in the same way that he invigorated animated film with Pixar, the music industry with iTunes, and the mobile phone market with the iPhone, Jobs’ next mission is to invigorate the publishing industry with the Tablet. Abell talks specifically about the newspaper and magazine publishing industry, but his comments, I think, can easily be broadened to books also, as he talks about making readers forget about the printed page. I’m excerpting here because the words about publishing may be missed by many readers — Short excerpts, but with considerably more valuable nuggets than will fit into a 140-char Tweet:

If he is looking for One Last Thing, saving journalism would be the Holy Grail. … The device will have to make readers forget — really forget — the printed page. E-readers, for all that they do, don’t do this yet.

After detailing Jobs’ accomplishments in invigorating other industries, as mentioned above, Abell concludes with these words:

Even given this track record — and what we choose to believe is the all-trumping motivator of perfecting his legacy — a device-centric initiative that saves newspapers and magazines that seem to be in perpetual, some say irretrievable, decline, sounds next to impossible.

But is anybody seriously willing to bet against the house — of Jobs?

I love serendipity — I happened to see these two pieces on the same day recently, and couldn’t help putting them together. Is there a meaning somewhere here? ….

Information on the Internet That Should Go Away, Roy Tennant

This is the kind of information I wish would disappear: old, outdated, in many cases downright misleading or incorrect. Now to only find the algorithm for determining these characteristics and nuking this dreck off the net! (boldface added here and below)

A case of great minds thinking alike? …

Google Announces Plan To Destroy All Information It Can’t Index, The Onion

MOUNTAIN VIEW, CA—Executives at Google, the rapidly growing online-search company that promises to “organize the world’s information,” announced Monday the latest step in their expansion effort: a far-reaching plan to destroy all the information it is unable to index. … “Our users want the world to be as simple, clean, and accessible as the Google home page itself,” said Google CEO Eric Schmidt at a press conference held in their corporate offices. “Soon, it will be.”

Fun Kicker — My first idea for a title for this article was “… Trimming the Internet.” Then I thought differently, and googled for “weeding the Internet” to see what might turn up – Sure enough, one of a handful of retrievals with that phrase is a library handout from libraries.uc.edu on The Library vs The Internet, sounding just like Roy: “No one’s weeding the Internet, and sites with seriously outdated information are still available.”

[This article accompanies previous article: Tagging in Hardin MD]

Soon after the launching of Hardin MD, in 1996, we began adding keywords in the hidden META keyword field (The first pages for HMD in Internet Archive [Dec, 1998] show them on all pages checked.) We began checking to see if HMD pages were appearing in search engine results in about 2000, and found that meta keywords didn’t seem to have much effect.

So, in late 2000, we began experimenting with putting keywords (aka tags*) at the bottom of the page, where most users wouldn’t notice them. At first we didn’t see much effect in search engine results, when using the tags mostly for variant spellings or terminology (e.g. on the Hematology page: blood diseases, haematology).

In 2001, as Google rose to prominence, and Search improved, we began using tools that gave the ability to see the popularity of specific words (HitBox, ExtremeTrackingWordTracker). We learned that using mis-spelled word variants as tags worked very well in drawing SE traffic. It was also during this time that links to pictures were being added to HMD, and we discovered the power of the word “pictures” in drawing SE traffic.

Time-line of tagging in Hardin MD

Based on invaluable help from Internet Archive — Starting from here: Internet Archive for Hardin MD, 1999+

The first HMD pages in Internet Archive in Dec, 1998 have meta keywords, but not tags on the page. Example of meta keywords (Hardin MD: Cardiology): health, medicine, medical, nursing, nurses, nurse, disease, diseases, best, list, lists, consumer, cardiology, cardiac, heart, stroke, cardiovascular, cardiothoracic, pacemaker, defibrillator, attack, arrest

Tagging for misspellings – Ophthalmology, I’m sure, would have been one of the first pages on which misspellings would have been used. Internet Archive pages show clearly that the first implementation was in early November, 2000. …

Ophthalmology, Nov 7, 2000 – No misspellings in meta keywords. There are no tags on page.
Ophthalmology, Nov 15, 2000 – Has misspellings in meta keywords and on page: [ophthamology]

This fits my memory of events — I was especially motivated to look for ways to draw Web traffic, because Google was just becoming prominent, rationalizing the search process, and making it easier to predict the effects of changes on page traffic.

Other examples of pages with tags on the page, with variant spellings, from about the same time: Orthopedics Nov 16, 2000 [orthopaedics] and Hematology Nov 29, 2000 [blood diseases, haematology]

Use of the word “pictures,” in tagging and in page titles

First use: Genital Warts Jun 10, 2002

First widespread use – Several pages linked on Hardin MD Index page Sept 30, 2002


In his interesting book The Great Influenza (2004) on the 1918 Flu epidemic, John M. Barry begins by giving the background and context of 19th century medicine. He says that medicine during this time lagged behind other sciences, especially because doctors were slow to embrace the quantitative methods and tools that helped other sciences like chemistry and physics make great advances. For example — amazingly — although thermometers were invented 200 years earlier, it wasn’t until the 1820′s that they were first used by medical people to measure body temperature in Europe (The US was even slower to change, and thermometers were still rarely used in the Civil War.) In the 1840′s and 1850′s John Snow was the first to use numeric methods for populations of patients, in his pioneering epidemiologic study of cholera in England. As Barry makes these fascinating observations about 19th century medicine, he adds this footnote, lest we think we’ve completely escaped the innumerate medicine of the 1800′s:

The effort to correlate treatments and results has not yet triumphed. A “new” movement called “evidence-based medicine” [boldface added] has emerged recently, which continues to try to determine the best treatments and communicate them to physicians. No good physician today would discard the value of statistics, of evidence accumulated systematically in careful studies. But individual doctors, convinced either by anecdotal evidence from their own personal experience or by tradition, still criticize the use of statistics and probabilities to determine treatments and accept conclusions only reluctantly. Despite convincing studies, for example, it took years before cancer surgeons stopped doing radical mastectomies for all breast cancers.*

From my college training in History of Science & Medicine, I learned the subversive nature of the discipline — Subversive because it forces us to realize that sometimes we’re not as far beyond ancient methods and ideas as we think we are. How much is there in contemporary medicine that’s still a vestige of the relatively recent past that Barry describes?

*Radical mastectomies for breast cancers: The definitive study that disproved the value of this is here.

Marybeth Peters, head of the US Copyright Office (part of the Library of Congress), said this in her testimony before Congress yesterday:

The Copyright Office has been following the Google Library Project since 2003 with great interest. We first learned about it when Google approached the Library of Congress, seeking to scan all of the Library’s books. At that time, we advised the Library on the copyright issues relevant to mass scanning, and the Library offered Google the more limited ability to scan books that are in the public domain. An agreement did not come to fruition because Google could not accept the terms.

As discussed in my article in June, it seems surprising that the Library of Congress has not taken a more active role in the mass-scanning project that Google is doing. Peters’ words explain why — The copyright mess! If copyright gets fixed, LC might be doing the project instead of Google.

It’s encouraging that Peters has finally been given a platform to talk about the mess. She did talk about it at a Columbia University meeting in March, although it was not widely reported, and was apparently only recorded on a video which was not transcribed (see my transcription of a key passage here). At that conference, she’s reported to have said that Congress had shown no interest in hearing her testimony. Hopefully they’re ready to listen now.

Peters stresses in her testimony yesterday, and in her talk at Columbia, that Congress needs to be the one to fix copyright law. Letting the judiciary branch speak through the Settlement, she says, is making an “end run around the legislative process” — her words in yesterday’s testimony. Brewster Kahle used the same words in April.

With the GBS settlement discussion heating up, it’s becoming increasingly clear to me that the root of the problem is US Copyright law. As Peters suggests, until copyright is fixed, mass-scanning of books is going to be problematic.

Steve Pociask wrote an article in Forbes last week, “Google’s One Million Books,” on the Google Book Search Settlement. There’s been a lot of commentary about GBS recently, as the October Settlement hearing approaches, and I was doubtful that tweeting this article with it’s forgettable title would get much attention.

Reading the lead paragraph of the article, though, I was struck by the lead sentence: “Imagine that your home and the homes of millions of your neighbors are burglarized.” Pociask suggests that the “burglar” metaphor might be a good fit for the Settlement. Hmmm, I think, surely someone will pick up this bold, unique metaphor in a tweet. But with a Twitter search I found that, surprisingly, no one had used it. And searching further, I found that the only tweets on it just used the article’s uninspiring title, and not surprisingly, few of these had gotten any retweets. So I tweeted to bring out the “burglar” theme, and got two retweets by the end of the day. Here’s my tweet:

Google as Burglar of One Million books? – #GBS settlement, Steve Pociask, Amer Consumer Inst (Forbes) http://bit.ly/MqovK

I also added the name of the author and his connection with the Amer Consumer Inst, which I think added interest to the tweet, and which had gotten little attention in previous tweets on the article.

So, the simple lesson — When tweeting a link to an article, remember there’s no rule that you have to use the title that the author used. If it’s boring and unexciting and you think your followers’ eyes will gloss over reading it, use something else! READ THE ARTICLE and see if it has an interesting theme that’s not brought out by the title, and base your tweet on that instead.

The list below is 50 consecutive random links to Wikipedia articles using the Random Article link that’s in all articles. As suggested in a recent study by Kittur, Chi & Suh (discussed below) I’ve divided these random articles into the top level Wikipedia categories. More interesting than these categories are other broad subjects (as picked out by me) in the articles below: Sports (7 articles), Pop Music (6), Europe (5), Politics (4), India (3). These subjects, I think, give a good flavor of the sorts of articles in Wikipedia.

Beyond the categories and sub-cats though — The most striking thing about this random sample of Wikipedia articles is the narrow, limited nature of the articles — Almost all of them are about things that No One Has Heard Of! — A great example of the Long Tail effect. Only in this case, it seems to be almost all Tail, and very little Head. Obviously, there are thousands of Wikipedia articles on well-known subjects, which we read every day. But in terms of numbers, the articles on minor, unheard-of subjects vastly outnumber the popular ones.

[There's more commentary below following the list]



  • Brigadier General Anthony Stack
    Currently a Brigadier General in service of the Canadian Forces, 1 screen
  • Roy Orchard Woodruff
    Politician, soldier, printer and dentist from Michigan (1876 – 1953), 1 screen
  • Ed Bryant
    Former Republican member of the US House of Representatives from Tennessee, 1948- , 3 screens
  • Missy Higgins
    Australian singer-songwriter, 1983 – , 7 screens
  • Răzvan Sabău
    Romanian tennis player, 1977- , 1 screen
  • Mirza Rizvanović
    Bosnian football defender, 1 screen, Stub
  • Steve Byrne
    American stand-up comedian, 1974- , 1 screen, Stub
  • Joan Hambidge
    Afrikaans poet, literary theorist and academic, 1956- , 2 screens
  • Agim Kaba
    American-Albanian actor, writer, director, sound editor, dancer, and film producer, 1980- , 1 screen
  • Verda Welcome
    African-American teacher, civil rights leader, and Maryland state senator, 1907 – 1990, 2 screens
  • Harolyn Blackwell
    African-American lyric coloratura soprano, 1955- , 7 screens
  • Ron Sobieszczyk
    Retired American professional basketball player, 1 screen
  • John Cumberland
    Former Major League Baseball player and coach, 1947- , 1 screen, Stub


  • Cwmcarn Forest Drive
    Tourist attraction and scenic route in Cwmcarn, Crosskeys, Wales, 1 screen, Stub
  • Vila Chã
    Portuguese parish with 2,957 inhabitants and a total area of 5.49 km², 1 screen, Stub
  • Chojnowo
    Village in Krosno Odrzańskie County, Lubusz Voivodeship, western Poland, 1 screen, Stub
  • Interstate 17
    5 screens
  • Withee (Town) Wisconsin
    Town in Clark County in the US state of Wisconsin, with population of 885 at the 2000 census, 1 screen
  • Edson, Wisconsin
    Town in Chippewa County in the US state of Wisconsin, with population of 966 at the 2000 census, 1 screen




  • Swannia
    Genus of moth in the family Geometridae, 1 screen, Stub
  • Proflazepam
    Drug which is a benzodiazepine derivative, 1 screen, Stub




I’m assuming that the Random Article link used to derive these links is truly random, that it does give a good sample of all Wikipedia articles. Surprisingly, I have not been able to find a Wikipedia article on “Wikipedia Random Article,” or any other commentary on it that might give an idea about this. I also have found no indication that anyone else has attempted to make a list of random Wikipedia articles, as presented here. Please let me know in a Comment if I’m missing something!

The purpose of the Kittur, Chi & Suh paper (PDF) mentioned above was to map all Wikipedia articles to one of the top level Wikipedia categories. The articles in the list above fit their results fairly well for most of the categories. Here are their results and mine (in parentheses):

Culture: 30% (28%)
People: 15% (26%)
Geography: 14% (12%)
Society: 12% (8%)
History: 11% (10%)
Science: 9% (4%)
Technology: 4% (6%)
Religion: 2% (0%)
Health: 2% (0%)
Math: 1% (2%)
Philosophy: 1% (0%)

The assigning of categories by me is imprecise at best, so it’s not surprising that there’s not complete agreement between my findings and those of KCS. It’s also possible that the real division of categories has changed since KCS collected data for their study, in Jan, 2008. Finally, one more bit on KCS – Their paper has the same base title as this article (What’s in Wikipedia?) — I actually thought of this title before I found their paper — In fact I found their paper because I searched for the title after I thought of it for this article! So I don’t feel like I’m stealing their title ;-)

Acknowledgement to my son David: The writing of this article is a long tale in itself! It arose from a (rather hair-brained, I now see) question I pondered — whether there’s a way to generate a “random Web page” from anywhere (The answer, I think is No, but that’s a separate discussion). As I discussed this idea with David, he mentioned the Random Article link on Wikipedia articles. I had actually never noticed this before, and found it quite interesting, which led to this article. Also David confirmed from his younger perspective that the links in the sample above are indeed obscure!

Note: These random links were generated over three separate days in the last week.

Mark Rabnett, in his article Five ways to improve PubMed says what many medical librarians are no doubt thinking. The Medical Subject Heading (MeSH) system, used by the National Library of Medicine to index articles in PubMed/Medline, is certainly one of the best indexing systems in the world. Unfortunately the way it’s implemented in PubMed makes it difficult for users to appreciate its elegant features. Rabnett reports on a brainstorming session on improving PubMed at the recent annual Canadian Health Libraries Association conference. One of the five suggestions was to Improve the MeSH database:

Where to start. The MeSH database is stiff and laboured, with occasional outbreaks of tumid extravagance. My group all agreed that we need clearer, more intuitive visual displays of the thesaurus and subheadings. The creation of a search statement using MeSH headings needs a complete rethink. … Even searching for MeSH headings is difficult and unpredictable. But worse, no one really understands it.  When I teach MeSH, my students glaze over as if I were lecturing on 12-tone music. The way PubMed presents MeSH is fussy and needlessly complex. We need a MeSH mashup.

Hey medical librarians — Let’s help our users discover the buried treasure of MeSH!