Uncategorized Category


Subreddit Algebra

photo credit: Laura Crossett

Yesterday, FiveThirtyEight featured a fantastic article by Trevor Martin, a Ph.D student in Computational Biology at Stanford University. Martin’s piece, Dissecting Trump’s Most Rabid Online Following, looked at the toxic communities surrounding Donald Trump, notably r/The_Donald, by using a machine learning technique called latent semantic analysis. LSA uses words and concepts from two sets of documents and shows how closely they are related. Martin used this process to find the overlap between different subreddits; two different subreddits are more similar if users comment in both. He then goes further to use what he calls “subreddit algebra”. By adding or subtracting the subreddits together, other related subreddits can be revealed. For example, r/nba + r/minnesota = r/timberwolves. If you’re interested in semantic vector math, there’s a fun twitter bot that does this algebra several times per day.

As with all FiveThirtyEight’s data stories, they make their code freely available for readers to try out themselves. I thought it’d be interesting to take a peek at some subreddits that are a little closer to home (and a whole lot less racist and sexist). If you don’t want to run this yourself, feel free to skip to the results below.

The Setup

If you want to follow along, you’ll need some familiarity with the Google Cloud Platform since that’s where everything will be run. Specifically, you’ll be using their BigQuery service, which is a tool for working with massive datasets. You’ll also want to set up a bucket in Google Storage. Your outputs will be quite large and they don’t allow you to export directly to your local file system. Finally, you’ll need some basic familiarity with the R language and an environment to run R scripts. RStudio is a great tool for this.

First, from your Google Cloud console, create a new project to contain the various tables you’ll be generating. Next, head over to BigQuery and create a new dataset under your project. You could call this something like ‘reddit’. This will hold your results. You’ll be querying against fh-bigquery:reddit_comments set that is made available to you by default. Click on the Compose Query button and use this code from the fivethirtyeight GitHub repository. Change line 19 to the path of your own dataset you just created.

Take the resulting dataset that this query generates and export it to the storage bucket you created. From there, you can download it as a CSV file.

Now, in RStudio, load the vector analysis script from the repository. You’ll need to change the path to the CSV file on line 20 to your exported CSV. And, of course, change the various subreddits after line 59. Now the fun begins!

The Results

The first obvious search is for similar subreddits to r/IowaCity. What kinds of things do Iowa City folks post about? The higher the number, the more related the subreddits are.

Cedarrapids 0.4627451
Madisonwi 0.4278260
Uiowa 0.4216467
Milwaukee 0.4069844
Homebrewing 0.3992629
Beer 0.3941419
Chicago 0.3916151
Indianapolis 0.3868063
Iowa 0.3850677
Smoking 0.3823774

Ok, not surprising. Surrounding cities plus beer drinking and smoking meats. Iowa City redditors are a chill bunch. What about the uiowa subreddit?

IowaCity 0.4216467
Mazdaspeed6 0.2913548
Swimming 0.2766708
Projectcar 0.2719264
Madisonwi 0.2699070
Cartalk 0.2696891
College 0.2646985
Cars 0.2642775
Civilengineering 0.2637309
Milwaukee 0.2634588

I’ll admit, there are a surprising amount of car discussion going on. Perhaps not when you see some of the cars downtown.

What happens when we take the uiowa out of Iowa City? IowaCity – uiowa =

PoGoIC 0.2447359
Smoking 0.2135908
Homebrewing 0.2053004
BBQ 0.2028280
Grilling 0.1997918
Sousvide 0.1983743
Wine 0.1961068
Cedarrapids 0.1937385
Bourbon 0.1917187
Spicy 0.1895046

Iowa City likes to grill out and drink. And play Poekmon Go. Let’s see what librarians are up to. From r/Libraries:

Librarians 0.6681721
Teachers 0.6463503
Knitting 0.6231567
Parenting 0.6165957
Weddingplanning 0.6118699
Genealogy 0.6118073
Wedding 0.6039990
Femalefashionadvice 0.6024974
Crochet 0.6010991
Vegetarian 0.5975182

Congratulations, librarians, on your marriage and children! And your new fiber arts project. What happens when we remove the wedding planning from librarians’ reddit posts?

Corruption 0.3048685
HistoryofIdeas 0.2961678
CornbreadLiberals 0.2932469
TrueProgressive 0.2924358
Scifi 0.2919506
Media 0.2833257
WarOnComcast 0.2797392
TechNewsToday 0.2789546
InCaseYouMissedIt 0.2789388
Obama 0.2774487

What other interesting algebra problems could we think up? Send me an email and I’ll try to post a few next week. After all, it’s Friday and I’m off to drink beer, grill some vegetarian food, and read sci-fi after I’m done parenting for the day. This weekend might be a good time to pick up knitting.


DRP welcomes Rob Shepard!

Digital Research & Publishing is pleased to announce that Rob Shepard has accepted our offer to be the new Geospatial Information Systems (GIS) Librarian for the UI Libraries. Rob comes to us from the University of Nebraska – Lincoln where he is pursuing a Ph.D. in Geography.

University of Iowa campus map, ca. 1943

University of Iowa campus map, ca. 1943

We at DRP are looking forward to the talents and experience Rob brings that will further enhance the accessibility and usability of geospatial resources (everything’s spatial!) in the Iowa Digital Library.  Rob will also be working on cross-campus coordination of GIS and support for faculty research and other Libraries partners.

Moving items into Main Library, the University of Iowa, 1951

Moving items into Main Library, the University of Iowa, 1951

Welcome, Rob!


A Monument Man at SUI

Two collections in the Iowa Digital Library, University of Iowa Alumni Publications and University of Iowa Yearbooks include over 40,000 pages of campus history.  Locating a specific name or event would be a challenge, but Optical Character Recognition (OCR) technology allows the collections to be full text searchable.

The name George Stout has been in the news a lot lately as the basis for the lead character in the movie Monuments Men.  A 1921 graduate of what was then the State University of Iowa (SUI), he also makes several other appearances in the both the yearbooks and alumni publications.

George Stout, Hawkeye Yearbook, 1921

George Stout, Hawkeye Yearbook, 1921

Stout is listed among the artists of the humor publication Frivol, which while unfortunately not digitized, is available in the University Archives’ Student-produced Publications and Newsletters Collection.

Frivol 1920

Frivol, 1920

Stout - Frivol

Frivol Staff, 1921

Stout is also mentioned in the March 1921 issue of the Iowa Alumnus for delivering a short address for Foundation Day, the UI’s 74th birthday.  While there’s no accompanying picture for this event, the IDL collection Iowa City Town and Campus Scenes includes several photographs from earlier Foundation Days.

Foundation Day speech, The University of Iowa, 1910s?

Foundation Day speech, The University of Iowa, 1910s?

Finding information in Iowa Digital Library text collections is made simple through OCR and word highlighting.

Iowa Digital Library Image & Text Viewer

Iowa Digital Library Image & Text Viewer

Enjoy more than a million digital objects created from the holdings of the University of Iowa Libraries and its campus partners. Included are illuminated manuscripts, historic maps, fine art, historic newspapers, scholarly works, and more. Digital collections are coordinated by Digital Research & Publishing.


Remembering the Gettysburg Address

Today is the 150th anniversary of Abraham Lincoln’s Gettysburg Address. The Iowa Digital Library includes over 1000 items digitized from the archives of Lincolniana collector James Wills Bollinger.

View additional items from the Bollinger-Lincoln digital collection.

This is Abraham Lincoln, Page 14

This is Abraham Lincoln, 1941, Page 14 | The James W. Bollinger Digital Collection

This is Abraham Lincoln, Page 15

This is Abraham Lincoln, 1941, Page 15 | The James W. Bollinger Digital Collection


Lincoln, a story in poster stamps, 1939 | The James W. Bollinger Digital Collection









The Gettysburg Speech, Bernard Wall etching, 1924 | The James W. Bollinger Digital Collection.


Bon Voyage, Christine!

We in Digital Research & Publishing sadly bid fond farewell to Christine Tade. Christine’s involvement in DRP extends back almost to the beginning of the department, to a 2006 professional development internship, where Christine learned the ins-and-outs of applying descriptive metadata to Iowa Digital Library materials. Afterward, Christine was the point person for digital collection metadata in the Cataloging department, training and supervising staff there, finding ways to bend the software to her will and making more archival collections usable online.

"A thoroughbred" 1907

Christine officially joined Digital Research & Publishing in 2012, six months after the launch of DIYHistory, the Libraries crowdsourcing transcriptions project. While continuing her digital collection work, Christine transitioned into the role of chief correspondent with transcribing participants, answering questions and also transcribing and reviewing many manuscripts herself. In July, DIYHistory reached a major milestonetemp, 35,000 pages transcribed.

Automobile crossing a bridge on a dirt road, Iowa, 1922

Christine has contributed greatly to the success of many projects and collection initiatives. We wish her the very best in her retirement!


Winet new director of Digital Studio for Public Humanities

Jon Winet, Director of the Digital Studio for Public Humanities at the University of Iowa

Jon Winet has been named the inaugural director of the Digital Studio for Public Humanities at the University of Iowa.

The new Studio is a campus-wide initiative based in the Main Library that will encourage and support public digital humanities research and scholarship by faculty, staff, and students, including those involved in “Public Humanities in a Digital World,” one of the interdisciplinary faculty “clusters” that have been established so far under the UI Cluster Hire Initiative.

Provost P. Barry Butler Professor stated in a note to faculty late last week:

“Winet has long been a strong advocate and practitioner of public digital humanities and art.  Many of you may know him as one of the driving forces behind the online art and literature project The Daily Palette.  He directs The University of Iowa UNESCO City of Literature Mobile Application Development Team, which last fall launched ‘City of Lit,’ an iPhone app that highlights Iowa City’s rich literary history.  He has engaged in a series of collaborative projects around politics, art, language, and image in the Information Age, including ‘Novel Iowa City,’ an experimental community writing project created and presented via Twitter during the 2011 Iowa City Book Festival.  He is currently in pre-production on ‘First in the Nation,’ a New Media documentary project on the run-up to the 2012 Iowa Caucuses.  In 2007, he received the UI President’s Award  for State Outreach and Public Engagement.”

The Libraries is excited to have the Digital Studio located on the first floor of Main Library and we look forward to partnering with Jon and others on this exciting initiative. You will hear more about the Digital Studio in the months ahead, as it gets up and running under Jon’s leadership. Welcome, Jon!