When it comes to corpus analysis, scholars have tended to focus on stylistic or linguistic patterns in an author’s work. Punctuation is often excluded from these conversations, yet it is not entirely clear as to why this is the case. Periods, commas, hyphens, etc., are meaningful units of expression, and they can typically serve as a kind of signature by which to identify an author’s more nuanced expression. Not only that, but they can tell us important things about the social, cultural, and historical conditions under and through which a text was produced.
All too often, though, these elements of language are the first things to go when using digital software to analyze big data. The tutorials that I have found often view punctuation as entirely expendable units of expression. The same is true of stopwords, or those elements of language which most programs seem to categorize as “nonessential.” Here, I am thinking of the ways in which word clouds are generated according to a “weighted” vocabulary the program deems more important than others. Moreover, the same appears true in most sentiment analyses which pull data from a lexicon containing “significant” words with clear positive and negative connotations.
For these last few weeks of the summer fellowship, I have been exploring those marginal aspects of language which are often either forgotten or intentionally excluded from most datasets. It was my feeling that a computational analysis of Whitman’s punctuation offers important insights into the ways in which different media formats influenced his work. The comma, for example, serves many functions: it organizes parts of a list, joins together different ideas, and it can even act as a surrogate for other words.
As revealed in the graph, the comma is among Whitman’s most-used punctuation. By itself this is perhaps not a revelatory statement. I’d bet that the comma is probably the most-used punctuation mark in the entire English language. How exactly Whitman employs it in his poetry, prose, and correspondence is worth investigating further. The same goes for other punctuation marks. In Whitman’s correspondence, an increase in em dash usage is particularly noteworthy. Working as a research assistant for the Walt Whitman Archive, much of what I do consists of transcribing and encoding these messages. What I have found is that postal cards in the nineteenth century, even more so than traditional “letters,” contain a tremendous amount of em dashes. A number of reasons can explain this, but the most compelling to me is the smaller physical size of these messages. The materiality of the message itself is just as important as the content these writers are attempting to communicate. In Whitman’s prose, too, we begin to see an increase in em dash usage. It would be interesting to see whether or not the emergence of the postal card in the mid to late nineteenth century had any significant impact on Whitman’s postbellum writing. Continuing this project, I will try to incorporate some temporal dimension that could help track such developments.
This summer the Studio will pilot a new fellowship program with the help of the University of Iowa Graduate College and the Studio Steering Committee. Nine current graduate students have been named Summer Studio Fellows. The students will soon take part in an 8-week course that provides mentored digital scholarship experience, as well as training in skills and tools they might use as they pursue innovative ways of thinking about and sharing their creative endeavors. Below you can read more about new fellows and a description of their proposed projects.
Hayder Alalwan, PhD student, Chemical and Biochemical Engineering Department Currently working on a PhD in the Chemical and Biochemical Engineering department, Hayder Alalwan will continue work on a project started in the Spring of 2014. He will explore the creation of a website to publicly share information on chemical looping combustion (CLC). That process process uses the lattice oxygen molecules of metal oxides to decompose the gas, instead of air, which minimizes formation of pollutant byproducts such as NO2, N2O, or NO, which form when the reaction occurs in air (e.g., N2 and O2). In addition, the CLC process is highly efficient at decomposing gas with little to no side reaction. Hayder’s work will help bring his research findings to a broader public as part of his work in science communication.
Alexander Ashland, PhD student, English Department Alexander Ashland plans to expand on his work of Mapping Whitman’s Correspondence, integrating new data into an existing database, dedicating time to revisiting the existing prototype, and exploring the possibilities for implementing crucial features, such as search functionality, timescale manipulation, dynamic proportional symbols, and filterable keywords. Ashland’s current data has been gathered from the Civil War, Reconstruction (1867-1876), Post Construction (1877-1887), and Old Age (1888-1892) eras.
Sonia Farmer, MFA student, Center for the Book Sonia Farmer plans to launch a podcast that shares the rich world of Caribbean literature. The podcast will provide Caribbean writers with a platform share their writing, and grant people easy access to a multitude of voices. Farmer comes to us from the UI’s Center for the Book to hone her digital editing skills and develop the platform.
Andrea Lakiotis, MFA student, Literary Translation Program
Andrea Lakiotis will explore online digital publishing while engaging with translation theory and practice. She brings experience in digitizing data, mapping, and code to the digital translation work she will be doing with the Studio.
Caitlin Marley, PhD student, Classics Department Classics student Caitlin Marley plans to analyze Marcus Tullius Cicero’s corpus through computing algorithms by using his orations and social network. With this information she will map the “emotional plot” of the orations as well as the networks across space and time.
Ben J. Miller, PhD student, Psychological and Quantitative Foundations Department Ben J. Miller studies the educational needs of pediatric patients and their families. Efficient and effective education plays a large part in regard to their care. This summer, Ben will refine his digital design skills in service to educating parents on using distraction to help their children cope during painful medical procedures. Ben is designing an infographic for use in pediatric waiting rooms that demonstrates how to harness the power of their smartphones and tablets for distraction.
Arianna Russ, MFA student, Dance Department As an MFA student in Dance Performance, Arianna Russ explores the integration of digital media into her artistic work. In collaboration with Dance and Theatre Arts Assistant Professor Dan Fine, Arianna will deepen her understanding of motion capture and digital artistic practice.
Katherine Wetzel, PhD student, English Department As a doctoral candidate in the department of English, Katherine Wetzel plans to continue her work on Met-Memory that she is currently constructing as part of her Studio Scholars Initiative. This project examines the tensions within local, national, and global expressions of Britishness as they occur in late-Victorian literature. The summer fellowship will also provide her with opportunities to explore the place of theory within the digital humanities.
Mary Wise, PhD student, History Department A PhD candidate in the History Department, Mary Wise plans to construct an interactive and publicly accessible map that examines the American Indian earthwork excavations in the Upper Midwest between 1890 and 1930. With training and support from Studio staff, she sees this project leading to the creation of an all-digital history dissertation.
In a blog post last week, I addressed Endangered Data Week and the history of political parties hiding, removing, or altogether abolishing public access to government documents. However, my post wasn’t alone in trying to shed light on this serious issue. In schools, universities, libraries, and classrooms across the world, hundreds of concerned people came together to bring awareness to the issue of endangered and disappearing data. And while Endangered Data Week is now over, the threat is not. So this week, I teamed up with the Digital Scholarship & Publishing Studio to highlight some of the excellent work currently being done by digital humanists and to provide some advice on how to get involved.
First, I visited with Tom Keegan, Head of the Digital Scholarship & Publishing Studio, and Matt Butler, the Studio’s Senior Developer, to discuss the services offered by university libraries to keep scholarly data safe. They stressed the import of digital institutional repositories in helping scholars to maintain their own data and make it accessible to others free of charge. The University of Iowa’s institutional repository, Iowa Research Online, houses an array of faculty, graduate, and undergraduate work. Librarians work closely with faculty, staff, and students to ensure these materials are properly archived and made available according to agreed upon standards. As I have pointed out before, non-university repositories like Academia.edu are for-profit and will indeed use your data in order to make them money.
Profit is a big factor to consider when thinking about where to put data. As Eric Kansa, founder of Open Context emphasized to me: “We need to maintain nonprofit (civil society) infrastructure to help maintain data (and backup internationally) during political crises. Organizations like the Internet Archive, and other libraries (including university libraries) are critical, because they have the expertise and infrastructure needed to maintain public records.” Kansa rightfully points out that libraries are integral to this fight, but notes that individuals need to know more about the vulnerability of data as well.
So, what do we do about all the government data (e.g. climate data) that is currently being pulled from government websites? This was just one question addressed by the group behind the formation of Endangered Data Week. Like most DH projects, EDW was forged by proactive academics who wanted to make a difference by using the biggest megaphone in the world: The Web. Michigan State University professor and digital humanist Brandon Locke, in collaboration with Jason A. Heppler, Bethany Nowviskie, and Wayne Graham, designed EDW on the model provided by Banned Books Week and Open Access Week. From there they brought the project to the Digital Library Federation‘s new interest group on Government Records Transparency/Accountability, directed by Rachel Mattson.
In order to find out more about this initiative and the problems they are addressing, I spoke to Bethany Nowviskie,Director of the Digital Library Federation (DLF) at CLIR and a Research Associate Professor of Digital Humanities, UVa. Prof. Nowviskie was kind enough to answer a number of questions I had about endangered data and how to get more involved in the fight to save it:
SB: Who owns federal data? In other words, should data be available to us because we pay taxes and fund data-producing institutions like HUD? The EPA? Why is the Executive in control of so much of this open data?
BN: Except where issues of personal privacy and cultural sensitivity are involved, data collected or produced by taxpayer-funded agencies of the federal government should be openly available to everyone. It’s a matter of transparency for the health of the republic — sunlight being, as they say, the best disinfectant — and of accountability of the government to its people. These are our datasets, and we should have the ability to analyze and build on them — using them to understand our world better, as it is, and to be able to *make it better.*
SB: How do we create a more centralized, non-profit infrastructure that can maintain data during political crises?
BN: Most pieces of our needed infrastructure are already in place. We call them libraries. The DLF will join a large number of allied groups in early May, convened by DataRefuge (our Endangered Data Week partner) and the Association of Research Libraries, to discuss a new “Libraries+ Network,” to take on exactly this issue: https://libraries.network/about/ Some questions that will motivate us: how can we create greater coherence among the many governmental, non-profit, and even commercial groups with longstanding commitments and expertise in particular areas of the data preservation enterprise? Might we re-energize and re-imagine something like the Federal Depository Library program for the digital age? What would it take for governmental agencies to implement data management plans for the full lifecycle of their information, just as researchers who receive federal funds are now typically required to do?
SB: What can regular non-specialists do to contribute?
BN: This is one reason DLF jumped at the chance to support grassroots efforts to organize the first annual Endangered Data Week. The goals expressed and audiences implied in our capsule summary (“raising awareness of threats to publicly available data; exploring the power dynamics of data creation, sharing, and retention; and teaching ways to make endangered data more accessible and secure”) go far beyond the professional research data management and data stewardship community. Probably the most useful thing a non-specialist can do is to educate herself on the issues and represent the value of open data legislation and the advances in open government we saw under the Obama administration to her representatives. We also need to urge follow-through on past bipartisan commitments in this sphere, such as the OPEN Government Data Act: https://www.datacoalition.org/open-government-data-act/
SB: Can you give some examples of digital projects or initiatives that depend on federal data to reveal racial inequity (e.g. redlining projects), bias, or certain dangers (e.g. lead poisoning)?
BN: Well, FOIA requests played an important role [in the Flint water crisis]— as they have done in Title IX enforcement on college campuses. In this sphere, I also think it’s worth mentioning that identical bills were recently introduced in both the House and Senate that would prohibit federal funds from being “used to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing.” [House Bill, Senate Bill]. They went nowhere, and were ostensibly meant to “protect local zoning decisions,” but *what is up with that?* This is the kind of thing that should energize non-specialist readers.
SB: How can we have trust in the integrity of datasets that have been given over to private institutions or saved by non-federal entities? In other words, who will hold the “control” copy (e.g. like a seed bank) that can assure us that datasets that have been saved were not then tampered with?
BN: So, there’s a huge professional community — many of them are DLF members or members of the National Digital Stewardship Alliance which we host — whose whole focus is on questions like this, and there are excellent protocols and procedures for ensuring data integrity. I’m not familiar enough with the ins and outs to give you a good quote, but it’s not a new problem, for sure, and methods for auditing and certifying digital repositories and verifying the integrity and security of the data within them are well established. As always, matters of policy, funding, and the professional development and nurturing of the communities who do the work are a bigger challenge than the technology!
Bethany’s comments above echo what others on campuses across the US are saying: data is a resource. Like water or electricity, access to it ought not be taken for granted. We must continue to be vigilant in the face of lazy and aggressive attitudes, alike. Libraries and library associations remain a big part of the fight to preserve this data, but all of us can play a part by being more aware, spreading the word, and getting involved in the movement.
As you may know, April is national poetry month, an annual series of events by the Academy of American Poets to help support the appreciation of American poetry. If you’re looking for great book-length collections of poems, you might be interested in the Iowa Poetry Prize winners. Many of the previous years’ winners are made available in PDF form at Iowa Research Online. What you may not know is that April is also National Poetry Generation Month, an annual tradition where programmers and creative coders spend the month writing code that generates poetry.
In honor of this time of year, I thought I’d take a look at the Iowa Poetry Prize winners through code. There are many methods for analyzing and generating natural language, but one system that has received a lot of attention recently is neural networks. A neural network is a large collection of artificial neurons based very loosely on a biological brain. These neurons exist in layers that perform statistical calculations and affect the state of other connected neurons. It differs from other computational models in that there is no knowledge hard coded and controlled by elaborate conditional statements (if this then that). Rather, neural networks learn to solve tasks by observing data and producing optimal functions that will produce similar outputs given new data it’s never seen before. The uses for such a system include image and speech recognition, classification problems, and many forms of prediction and decision making. For example, a neural net could be trained to detect images of cats by observing tens of thousands of labeled images of cats. Google has recently launched a new project that uses this technique to match your doodles with professional drawings.
What happens when we train an artificial intelligence to write english language having only read Iowa Poetry Prize winners? Let’s find out!
To start, I downloaded all of the IPP winners from Iowa Research Online, extracted the poems as plain text, and concatenated them all into a single text file named poems.txt. This served as the training set. Next, I set up this Torch-based Docker container implementation of a recurrent neural network based on work by Andrej Karpathy and Justin Johnson. It was tempting to spin up the Google cloud VM with an attached GPU, since these types of machine learning tasks are sped up greatly running on a graphics processing unit with CUDA, but it’s also quite expensive at 75 cents-per-hour. Once I had it working, I started the preprocessing and training, which took about 16 hours to complete.
After a lot of experimentation to create some useful training models and keep the network from overfitting and underfitting the data, I had something that was acceptable and so began sampling output. One parameter of sampling that was fun to play with was the “temperature” of the sample. A lower temperature produced output that was much more predictable and less error prone while a higher temperature was much more inventive but riddled with mistakes. I decided to split the difference and start at 0.5. Here’s the first poem.
Speritas Of The Stars
Morning comes of the sun
to the thin world is a star of her light.
The sheet and the body of parts
of the flame is a light, the body
sees of the wars beautiful on the street.
The sun, the stars of the sound, and desire,
and a man could love the streets.
The single shiller of light,
and the single stranger falls countal.
Father and she were the sutters of the body
instraining to the complete
window of light, still.
You’ll notice a few words in this poem that don’t actually exist in english. That’s because this RNN operates at the character level, not the word level. It has to learn, from scratch, how to write english. It starts with random strings of letters and slowly, after many iterations, learns about spaces, proper punctuation, and finally readable words. The higher the sampling temperature, the more invented words. Let’s look at a “hot” poem.
Pelies, One Yighter
The shadows just plance croved
I am one
its funlet from the wind
staskaccus, gring of detches of hearts face eashog
what wing to the streed in the resert of change, a glince
She read on his fill bathered, a hand the
with beautiful, casty, stery, kooms, in one father
something the mouth cold leaves.
A night and no one is a woman; you green her
My spere would must not the look teering mower
I see itselfor.
At that sign they thought the remelled the mum,
but like an wait they mite of ammiral
after things of the body
which children would love
the forest flowers and hark a path.
The shawr rate in a ruched parts in humstily
his poom her as of the trabs conterlity.
Much more Jabberwockyesque. If we ease up just a little on this we get
A Badicar Flower
The watcher blue says
they would have shapes,
the night dreaming,
a painted nother
tricks me, the wind,
the dayed from the boging feeling
of the histance in his everyness.
What do you think — poetry prize worthy? While writing poetry is fun, there are, of course, practical applications too. I’m currently working with faculty member Mariola Espinosa on a HathiTrust project called Fighting Fever in the Caribbean: Medicine and Empire, 1650-1902. We have 9.3 million pages of medical journals and need to find references to yellow fever in multiple languages. A trained neural network could look through these quickly and find references that a human might miss. I’m also working on another project with Heidi Renee Aijala looking for references to coal smoke in Victorian literature. Perhaps a neural net could be trained to look for non-keyword references.
While I’m probably not going to put a poet out of work any time soon, you can imagine many real-world uses. There is a tremendous potential for neural networks and other types of machine learning to caption images, transcribe handwriting, translate documents, understand the spoken word, and play chess at the international master level. Perhaps someday it might also write a meaningful poem.
Yesterday, FiveThirtyEight featured a fantastic article by Trevor Martin, a Ph.D student in Computational Biology at Stanford University. Martin’s piece, Dissecting Trump’s Most Rabid Online Following, looked at the toxic communities surrounding Donald Trump, notably r/The_Donald, by using a machine learning technique called latent semantic analysis. LSA uses words and concepts from two sets of documents and shows how closely they are related. Martin used this process to find the overlap between different subreddits; two different subreddits are more similar if users comment in both. He then goes further to use what he calls “subreddit algebra”. By adding or subtracting the subreddits together, other related subreddits can be revealed. For example, r/nba + r/minnesota = r/timberwolves. If you’re interested in semantic vector math, there’s a fun twitter bot that does this algebra several times per day.
As with all FiveThirtyEight’s data stories, they make their code freely available for readers to try out themselves. I thought it’d be interesting to take a peek at some subreddits that are a little closer to home (and a whole lot less racist and sexist). If you don’t want to run this yourself, feel free to skip to the results below.
If you want to follow along, you’ll need some familiarity with the Google Cloud Platform since that’s where everything will be run. Specifically, you’ll be using their BigQuery service, which is a tool for working with massive datasets. You’ll also want to set up a bucket in Google Storage. Your outputs will be quite large and they don’t allow you to export directly to your local file system. Finally, you’ll need some basic familiarity with the R language and an environment to run R scripts. RStudio is a great tool for this.
First, from your Google Cloud console, create a new project to contain the various tables you’ll be generating. Next, head over to BigQuery and create a new dataset under your project. You could call this something like ‘reddit’. This will hold your results. You’ll be querying against fh-bigquery:reddit_comments set that is made available to you by default. Click on the Compose Query button and use this code from the fivethirtyeight GitHub repository. Change line 19 to the path of your own dataset you just created.
Take the resulting dataset that this query generates and export it to the storage bucket you created. From there, you can download it as a CSV file.
Now, in RStudio, load the vector analysis script from the repository. You’ll need to change the path to the CSV file on line 20 to your exported CSV. And, of course, change the various subreddits after line 59. Now the fun begins!
The first obvious search is for similar subreddits to r/IowaCity. What kinds of things do Iowa City folks post about? The higher the number, the more related the subreddits are.
What other interesting algebra problems could we think up? Send me an email and I’ll try to post a few next week. After all, it’s Friday and I’m off to drink beer, grill some vegetarian food, and read sci-fi after I’m done parenting for the day. This weekend might be a good time to pick up knitting.
Zachary Turpin, a PhD candidate in English at the University of Houston (who made international headlines in April 2016 with his discovery of a previously unknown journalistic series by the poet Walt Whitman entitled “Manly Health and Training”) has made another major find: a long-lost, secret novella, also authored by Whitman, entitled Life and Adventures of Jack Engle: an Auto-Biography (A Story of New York at the Present Time). Totaling about 36,000 words, Jack Engle, was published anonymously as a work of serial fiction in six installments in 1852 in the New York Sunday Dispatch, a weekly newspaper edited by Amor J. Williamson and William Burns that regularly published works of serial fiction. Turpin’s incredible find means that Jack Engle, a novella that Whitman never talked about and that no one knew he had written, can now be republished for the first time since 1852 and confidently attributed to Whitman for the first time ever. In the current issue of the online, open access Walt Whitman Quarterly Review (WWQR), Editor Ed Folsom and Managing Editor Stefan Schöberlein have published Jack Engle in full, and the 150-page journal issue also includes an essential and substantial scholarly introduction by Turpin. The University of Iowa Press collaborated with Turpin and the Walt Whitman Quarterly Review to publish a print edition of Jack Engle, including an introduction by Turpin. Hardback and paperback copies of the novel are now available for purchase on the University of Iowa Press website.
Although the publication of Jack Engle today will certainly receive national and international media attention, its appearance in 1852 was barely advertised, and, as far as we know, there were no reviews of the novella and no other responses from newspaper readers. On March 13, 1852, the day before Jack Engle began its run in the Dispatch, a brief and unremarkable literary notice for the novella was published in three New York newspapers: the Tribune, the Herald, and the Daily Times. The notice promised readers that the following day, the Dispatch would begin publishing the Life and Adventures of Jack Engle, an Auto-Biography, a novella praised in the notice as “A Rich Revelation,” that would deal with “the Philosophy, Philanthropy, Pauperism, Law, Crime, Love, Matrimony, Morals, &c., which are characteristic of this great city at the present time.”
This new work of fiction would be printed over the next month from March 14 to April 18, 1852, with a few chapters appearing each week. As Turpin points out in his beautifully written and informative introduction, the novella ran without a byline, and there were no additional advertisements or notices. This is especially striking because Whitman had been a successful fiction writer throughout much of the 1840s. In fact, by 1852 he had published at least 26 short stories, and several of them had appeared in the Democratic Review, one of the era’s most prestigious magazines. He had also written a temperance novel, Franklin Evans, or the Inebriate, a Tale of the Times that had sold 20,000 copies–more than anything else Whitman would publish in his lifetime, including Leaves of Grass. It was only later that Whitman, then known primarily as a novelist and popular fiction writer, would become one of the nation’s favorite poets, and it took nearly 165 years after the original publication of Jack Engle for Turpin to discover the novella and prove conclusively that Whitman authored it.
Jack Engle, which Turpin describes as “a story of coincidence, adventure, and the incompatibility of love and greed” stands as a historic and incredibly important new find. According to University of Iowa Professor and Walt Whitman expert Ed Folsom, Jack Engle is a “momentous” find that “makes us rethink everything we thought we knew about Whitman’s fiction.” After all, it has long been believed that Whitman’s fiction career spanned only seven years–from the publication of his first short story “Death in the School-Room” in the Democratic Review in August 1841 until the printing of “The Shadow and the Light of a Young Man’s Soul” in the Union Magazine in June 1848. After this, Whitman was thought to have given up fiction writing for good and turned to composing poems–a puzzling career move given the success and widespread circulation of his earlier fiction. The publication of “Jack Engle” in 1852 offers evidence that Whitman did not make such a definitive transition from writing short stories to crafting poems with long, prose-like lines. Instead, he continued writing fiction at least into the early 1850s, which nearly coincides with the time he began working on the poems that would later appear in the first edition of Leaves of Grass, published in 1855. As Turpin puts it in his introduction, “Whitman did not give up but began again,” seemingly returning to novel-writing, while also drafting plot outlines and prose fragments. Because of Jack Engle, therefore, instead of seeing Whitman as a fiction writer abruptly shifting to poetry, scholars and readers alike must now imagine him as a writer of poetry and fiction for newspapers and magazines who was not yet sure what shape Leaves of Grass or even the next few years of his writing career would or should take.
Whitman has not traditionally been praised among scholars for his fiction writing ability. Thomas Brasher, editor of The Early Poems and the Fiction, a volume of the Collected Writings of Walt Whitman, said of the early stories that “Whitman had no talent for fiction,” while Emory Holloway, an early Whitman biographer, wrote that many of the poet’s early and melodramatic stories “deserved to die in the age of sighs that gave them birth.” Jack Engle stands in sharp contrast to Whitman’s early fiction even though it clearly drew from some of that writing. As Turpin points out in his introduction, Jack Engle is “some of the better fiction Whitman produced.” The plot and characters in this novella were clearly composed and sketched by a more mature Whitman; he was a far more experienced writer of newspaper fiction by 1852 than he had been when he wrote his short stories a decade earlier. According to Turpin, Whitman drew on several genres of popular fiction while writing Jack Engle, including “sentimentalism, sensationalism, adventure fiction, [and] reform literature,” among other genres. To this list, I would add detective and mystery fiction, as well as epistolary fiction given the number of texts–ranging from letters and wills to gravestones and prison narratives–that advance the central plot. By drawing on all of these genres, Whitman creates a novella that Turpin has called “Dickens Light” and that might also be described as a tale of corruption, misogyny, class division, religion, romance, and male friendship.
Jack Engle follows the adventures of a young orphan in New York who is adopted by Ephraim Foster, a milkman and “purveyor of pork and sausage” near the Bowery and his wife Violet, a woman Whitman describes as having the “breadth of a good sized man” and no knowledge of “what are now called Women’s Rights.” Ephraim urges young Jack to pursue the study of law, which Jack undertakes largely for the sake of pleasing his adoptive father. Soon Jack finds himself employed by “Mr. Covert,” a corrupt and greedy lawyer who “had among the forms of his selfishness, some political ambition.” Covert is also the guardian of a young woman named Martha, and he is constantly scheming to cheat Martha and other beneficiaries out of the inheritance left for them by her imprisoned and dying father. When Jack and Martha meet, they realize they are actually old acquaintances; they become fast friends once again, and Jack longs to help the unhappy young woman. Can Jack and Martha thwart Covert’s attempts to steal her inheritance? Even more importantly, will they be able to keep Martha from falling victim to the lawyer’s sinister and “licentious passions”? Will the identity of Jack Engle’s parents and his family history ever be revealed? The answers to these key questions and many more can only be found by reading the novella in full. Along the way, readers will meet numerous other odd and eccentric characters, including a dancing girl, several individuals of the Quaker faith, some law clerks (one of whom doubles as a detective), a series of night watchmen policing the boundaries of the city, and a dog that, oddly enough, is also called “Jack.” In order to learn the fate of Jack Engle and Martha, readers must follow the action of the story as it moves from a sausage vendor’s shop to the law offices and from the dancing girl’s home to the boat docks under the cover of darkness. The tale’s numerous mysteries are finally resolved and, at the end, the characters can begin to look forward to the future rather than back on their respective pasts.
Even though the air of mystery, the humor, and the impressive cast of characters set Jack Engle apart from Whitman’s earlier fiction, it is hard not to see some similarities between it and his previous writings. Much like the novella’s protagonist Jack Engle, the main characters of Whitman’s short story “The Love of the Four Students” are studying law under the guidance of a lawyer. The character of Violet Foster, Jack’s adoptive mother, is actually taken from “The Fireman’s Dream,” an unfinished piece of fiction that Whitman published in 1844 in the New York Sunday Times and Noah’s Weekly Messenger. Only two chapters of “The Fireman’s Dream” were ever published,” but the description of Violet in Jack Engle is taken, nearly verbatim, from the characterization of Violet Boanes in the earlier work. As Turpin also points out, the villainous lawyer “Mr. Covert” shares his name with an earlier character, “Adam Covert,” another evil lawyer who is murdered as a result of his scheming in “Revenge and Requital,” a tale that was published in 1845, seven years before Jack Engle.
These connections to Whitman’s earlier fiction, combined with both the improved quality of the writing in Jack Engle and the previous success of Franklin Evans, make Whitman’s decision to publish the novella anonymously seem especially perplexing. It would have made sense, after all, for Whitman to attempt to capitalize on the sales figures of Franklin Evans by including his name in a byline with Jack Engle. At the same time, Whitman was not typically silent about the success of his fiction. He once bragged to the editor of the Boston Miscellany that his short stories had been reprinted frequently by newspapers and magazines all over the country, and he was right in his assessment of the widespread circulation of those stories. Even when Whitman attempted to distance himself from his novel and the short stories late in his life, he was particularly vocal in his dismissals of those early works. In 1882, Whitman wrote that he sincerely wished “all those crude and boyish pieces” of fiction he wrote in his youth would drop into “oblivion,” and in 1888 he reportedly referred to Franklin Evans as “damned rot.” But as far as we know he never mentioned Jack Engle at all. There is no known evidence that Whitman ever claimed it or disavowed it. He does not seem to have spoken about it–not even to his family or closest friends–and the only time he wrote about it might well have been the plot outline, including character names that he recorded in his schoolmaster notebook, the document that led Turpin to this amazing find.
Whitman’s silence on the subject of Jack Engle–save for the notes about the plot–certainly made Turpin’s work more challenging. But Turpin’s research methodology is worthy of note here because he was able to utilize and move between digital and archival (print) collections to make the find. While prose fragments are not necessarily unusual in Whitman Studies, long lists of plot events and the actions of individual characters are rare. Scholars have long been aware of the schoolmaster notebook and a plot outline that includes the name “Jack Engle,” among others from the novella. Turpin followed the sparse evidence from digitized images of the notebook pages to newspaper databases, where he encountered an announcement for the publication of a novella that promised to detail the life and adventures of “Jack Engle.” Turpin was able to confirm his discovery–matching the plot Whitman outlines to the events of the full novella–with the support of the English Department at the University of Houston and research assistance from the Library of Congress, which holds one of the only–if not the only–series of extant issues of the Sunday Dispatch that printed the installments of Jack Engle. In collaboration with the staff of the Walt Whitman Quarterly Review, Turpin then transcribed and edited the novella for the journal and for the University of Iowa Press’s newly released print edition. With an open access (free) digital edition and a print edition of Jack Engle now available for purchase, the novella will certainly have a new life as readers and fans of Whitman examine this secret, long-lost novella for the first time. The very existence of Jack Engle also suggests that there may be more works of fiction by Whitman waiting to be discovered, and it is my sincere hope that many more rich revelations about the novella and about all of Whitman’s fiction will soon follow.
On February 12, 2016, the Digital Scholarship and Publishing Studio hosted the second DH Salon event of the semester—a collaborative presentation highlighting the Walt Whitman Archive’s Correspondence project. Presenters included Ed Folsom (Roy J. Carver Professor of English and Co-Director, Walt Whitman Archive), Stephanie Blalock (Digital Humanities Librarian & Associate Editor, Walt Whitman Archive), Stefan Schoeberlein (Managing Editor, Walt Whitman Quarterly Review & Graduate Research Assistant, Walt Whitman Archive), Alex Ashland (Graduate Research Assistant, Walt Whitman Archive), and Ryan Furlong (Graduate Research Assistant, Walt Whitman Archive).
The presentation was accompanied by an exhibit featuring three letters written by Walt Whitman in the 1870s and 1880s. These letters are among the many books and Whitman-related items that are held by Special Collections at the University of Iowa Libraries.
During their presentation, the Whitman Archive Correspondence team shared the digital edition of Whitman’s incoming and outgoing correspondence that they are currently building and gave the audience a behind-the-scenes look at the faculty, staff and student collaborations that make this digital project possible. All of the Correspondence team members at Iowa had the opportunity to share their work and research with the audience. In keeping with the collaborative spirit of the DH Salon, this post, like the presentation itself, reveals the roles of each member of the Whitman Archive Correspondence project team and explains how we make Whitman’s letters available to Archive users.
During our talk, I discussed my role as the current project manager for the Correspondence project and my efforts to write our grant proposals, design our workflows, and train our staff, including three graduate assistants here at the University of Iowa and an additional graduate research assistant at the University of Nebraska-Lincoln. I outlined the need for a digital edition of Whitman’s correspondence and the advantages our editions offers over earlier printed collections. I pointed out that our digital edition not only includes the outgoing and the incoming correspondence, but it is also integrated into the overall search functionality of the Whitman Archive. I also emphasized that our digital edition can easily accommodate newly discovered letters and that it is correctable; as a result, our users often become our collaborators by helping us to catch stray errors in transcription or by providing additional information on Whitman’s correspondents.
For my part of the presentation, I gave a numerical overview of Whitman’s two-way correspondence and gave some examples of how the Correspondence team is beginning to explore the letters through data visualization and topic modeling. With over 3,775 letters encoded (and most of them already published on our website), we can see some interesting trends emerge from this body of text(s) when we analyze them.
Besides noting an increase of extant Whitman-letters from the 1860s to the 1880s, we also find the ratio of surviving letters to and from Whitman shift (from the former to the latter), allowing us to trace the poet’s rise to celebrity and, hence, the collectability of his letters. I also presented some numerical visualizations of the correspondence and showed that Whitman’s letters address topics ranging from the publication of his various editions of Leaves of Grass to his declining health in his final years. I emphasized that the vast amount of information now available to us and our users makes it clear to us that we–the Walt Whitman Archive and the Walt Whitman Quarterly Review–want to engage (and encourage others to engage) with the two-way correspondence not just as individual texts and exchanges but as data.
I outlined some of the major challenges of the Correspondence project that I have encountered during the two years I have spent working with the letters. One of the most challenging aspects of creating a digital edition of the two-way correspondence is that the Archive staff must edit letters by a variety of authors writing from a variety of backgrounds, all with differing degrees of competency and literacy. Whitman received letters from doctors, university professors, book editors and publishers, but also from insane asylum patients, wounded soldiers, and the occasional obsessed fans. As a result, things like author style, syntax, and overall legibility vary drastically from one writer to the next, and the team has to work together to ensure that in the process of editing, things like words, punctuation, and paragraph divisions, remain consistent with the written letter.
I also pointed out that despite these challenges the Whitman Archive thrives precisely because of the unique opportunities it affords archivists and users alike. Because the digital platform and the infrastructure around which it is built is so adaptive and open to immediate revisions, users are always encouraged to contact and interact with members of the Correspondence Project team. And while the various personnel at Iowa have developed a rigorous process of transcribing, encoding, and checking letters, the archive allows for a level of user interactivity that is rarely seen in other digital archives.
For my part of the presentation, I discussed the work I have done as a first-year graduate research assistant for the Walt Whitman Archive. My primary responsibilities have included transcribing, encoding, and verifying Walt Whitman’s incoming and outgoing correspondences for 1887 and 1888. I demonstrated how this three-step process ensures precise and legible transcriptions of manuscript images are displayed for Whitman Archive readers in a user-friendly format and how I work to include pertinent biographical information, annotations, references to other correspondences, as well as the actual content of the letter itself. Ultimately, I showed how my position as a transcriber and encoder serves as a critical first step in accurately recording the contents of Whitman’s (and others’) correspondences before other members of the Archive team process and review them for publication.
I reviewed the overall workflow for letters, starting with transcriptions and encoding and extending through first and second checks before coming to me for a final “blessing,” which often involves adding further annotations, clarifying information about correspondents, and correcting transcription errors and typos that have made it through the first checks. I emphasized how the “blessing” by one of the directors serves as the final confirmation of scholarly and editorial accuracy that we can stand behind and stake our reputations as Whitman scholars on. I commented on how there’s a long “workflow” that occurs before we begin the first transcribing work, and that’s the brainstorming at our annual Whitman Archive full staff meeting in Lincoln each summer, where we decide on which projects we want to focus on, strategize about grant applications, decide whether Iowa or UNL (or somewhere else) will take the lead, then work on a grant proposal, the preparation for which often involves the first real steps in the project (generating a list of letters, deciding how many we can promise to the granting agency, and so on).
What our presentation revealed, and what I would like to emphasize here is that the Whitman Archive correspondence project is growing in exciting new ways. We are expanding our research to include topic modeling and data visualization, and an undergraduate intern has recently joined our team. The Correspondence team would also like to become a site where graduate students enrolled in the Public Digital Humanities certificate can complete their Capstone experiences.
Finally, the Walt Whitman Archive, which turns twenty-one this year, is one of the oldest and most comprehensive digital projects in existence, and collaboration between faculty, staff, and students has been one of the keys to its success. As digital humanists often claim, the digital humanities is collaborative in theory and in practice. The Whitman Archive Correspondence project team is one of many living embodiments of that statement on our campus, as the creation of a digital edition of Whitman’s letters depends on the combined efforts of the faculty, staff, and students that were a part of the DH Salon presentation. The Correspondence team believes that when we do not collaborate and communicate with each other, the library, our institutional partners, and even our users, then the people that suffer most as a result of those actions are those who depend on our site for their research, teaching, and pleasure reading, as well as the graduate and undergraduate students we are supposed to be mentoring toward various career options. For us, these are the people that least deserve to bear those consequences. At the end of the day, the Correspondence team firmly believes that one of our greatest strengths is that we are a faculty, staff, and student collaboration.
A few weeks ago, I presented an exciting new prototype program I’ve been keeping my eyes on. It’s a program out of UCLA called VSim. I had the pleasure of demo-ing this program for about 20 of my colleagues from across campus and we discussed the possible applications of a full-fledged VSim program. We would be able to publish 3D digital objects that include bibliographical data – great for the researchers, metadata – the favorite thing for librarians, paradata (data about the data) – the important stuff for scholars wishing to emulate or study the models. VSim not only has the potential to package this data together, but it is also meant to function as a presentation tool, with the functionality to link outwards to the internet. It was built with the primary audience of teaching scholars. This is exciting because it’s the only attempt, to my knowledge, of nesting contextual information directly in a 3D model. It may be a way to begin thinking about how we will preserve 3D in the future. Currently, we attempt to preserve the 3D model, but it’s typically separate from the narrative or publication that appears in the more traditional formats as a journal article of monograph. What’s the point of creating 3D when you’re just going to present it in a 2D medium? Hint: there is none.
The discussion moved quickly passed the oohh’s and awww’s (as is want with such a grouping) to talk about the technicalities. How would we support the new file type it’s creating? Why stray from the standard-not-so-standard .stl or .dae file formats? They work well enough, no? Or, how much would something like this cost to support should it go to production? There was an assumption that this tech would not be free to use. How would it all be packaged and disseminated?
For as much potential as those questions have to totally kill the buzz, we did quickly return to; “this is really cool. Can I play with it?!”
Digital Research & Publishing is pleased to announce that Rob Shepard has accepted our offer to be the new Geospatial Information Systems (GIS) Librarian for the UI Libraries. Rob comes to us from the University of Nebraska – Lincoln where he is pursuing a Ph.D. in Geography.
We at DRP are looking forward to the talents and experience Rob brings that will further enhance the accessibility and usability of geospatial resources (everything’s spatial!) in the Iowa Digital Library. Rob will also be working on cross-campus coordination of GIS and support for faculty research and other Libraries partners.
The name George Stout has been in the news a lot lately as the basis for the lead character in the movie Monuments Men. A 1921 graduate of what was then the State University of Iowa (SUI), he also makes several other appearances in the both the yearbooks and alumni publications.
Stout is also mentioned in the March 1921 issue of the Iowa Alumnus for delivering a short address for Foundation Day, the UI’s 74th birthday. While there’s no accompanying picture for this event, the IDL collection Iowa City Town and Campus Scenes includes several photographs from earlier Foundation Days.
Finding information in Iowa Digital Library text collections is made simple through OCR and word highlighting.