Prior to Samuel Morse’s invention of the telegraph in the first half of the nineteenth century, communication technology was chiefly limited to oral or textual messages delivered by a messenger. British sci-fi savant Arthur C. Clarke expands upon this fact, stating that “When Queen Victoria came to the throne in 1837, she had no swifter means of sending messages to the far parts of her empire than had Julius Caesar—or, for that matter, Moses.” My dissertation, “Messengers and Messages in Middle English Literature,” examines the under-explored role of messengers in fourteenth-century English romances, where they often prove to be crucial elements of the plot or interesting stand-ins for an authorial or narrative function.
While each chapter of my dissertation focuses upon an analytical close reading of specific medieval text or texts, such as The Canterbury Tales, The Death of Arthur, and Richard the Lionheart, I realized early in the project that I would also need a broader perspective on how medieval authors utilize messengers and messages throughout the corpus of the Middle English literary canon. To that end, my work this summer will be to perform what scholar Franco Morreti has dubbed “distant reading” to refer to the process of “understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data.” Because I am, unfortunately, not able to read a corpus of 300 medieval texts over the summer, I will be using Python coding scripts to “read” the texts for me and to extract data on keywords pertaining to messengers which I will then be able to interpret and incorporate into my more traditional dissertation work.
This is a particularly challenging undertaking given the lack of any spelling standard in Middle English, which makes the number of possible search terms for any keyword positively daunting. For instance, according to the Middle English Dictionary, in Middle English the word “messenger” is most often written as “messā̆ǧē̆r”, but also appears as messagere, messagier,missanger, mansonger, and at least 30 other derivations. This is further complicated by the myriad synonyms for the word messenger in Middle English—each with their own spelling eccentricities. Navigating through this linguistic labyrinth will, I hope, eventually result in a chapter of the dissertation displaying the value of a Digital Humanities approach to Middle English literature, complete with data visualizations and discussion of process, while also allowing me to support my own literary analysis with the data I’ve collected from the textual analysis project.
This may all seem a bit “high-tech” and futuristic, but much of the work is decidedly unexciting. With the help of my Studio contact, Nikki White, I’ve acquired a corpus of 300 Middle English text files. These files are in XML format, an encoding language designed to be both human and machine readable, so I’ve been able to write scripts to extract valuable metadata from the files. such as the title of each text, the author, and when the text was written (if these things are known—the most common medieval author is “Anonymous”). This metadata has allowed me to build an index which will support the queries that guide my textual analysis. Before the fun part can begin, however, I have to “clean” the data from the raw XML files. Cleaning in this case doesn’t involve a bucket and mob but is instead the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled and, since the raw XML files from the Middle English Dictionary have been aggregated from numerous sources, there is an abundance of duplications and mislabeling to be found. I haven’t done this much cleaning during the summer since the time I talked back to my Mother between 4th and 5th grade, but I am hopeful the results will be worth it.