When it comes to corpus analysis, scholars have tended to focus on stylistic or linguistic patterns in an author’s work. Punctuation is often excluded from these conversations, yet it is not entirely clear as to why this is the case. Periods, commas, hyphens, etc., are meaningful units of expression, and they can typically serve as a kind of signature by which to identify an author’s more nuanced expression. Not only that, but they can tell us important things about the social, cultural, and historical conditions under and through which a text was produced.
All too often, though, these elements of language are the first things to go when using digital software to analyze big data. The tutorials that I have found often view punctuation as entirely expendable units of expression. The same is true of stopwords, or those elements of language which most programs seem to categorize as “nonessential.” Here, I am thinking of the ways in which word clouds are generated according to a “weighted” vocabulary the program deems more important than others. Moreover, the same appears true in most sentiment analyses which pull data from a lexicon containing “significant” words with clear positive and negative connotations.
For these last few weeks of the summer fellowship, I have been exploring those marginal aspects of language which are often either forgotten or intentionally excluded from most datasets. It was my feeling that a computational analysis of Whitman’s punctuation offers important insights into the ways in which different media formats influenced his work. The comma, for example, serves many functions: it organizes parts of a list, joins together different ideas, and it can even act as a surrogate for other words.
As revealed in the graph, the comma is among Whitman’s most-used punctuation. By itself this is perhaps not a revelatory statement. I’d bet that the comma is probably the most-used punctuation mark in the entire English language. How exactly Whitman employs it in his poetry, prose, and correspondence is worth investigating further. The same goes for other punctuation marks. In Whitman’s correspondence, an increase in em dash usage is particularly noteworthy. Working as a research assistant for the Walt Whitman Archive, much of what I do consists of transcribing and encoding these messages. What I have found is that postal cards in the nineteenth century, even more so than traditional “letters,” contain a tremendous amount of em dashes. A number of reasons can explain this, but the most compelling to me is the smaller physical size of these messages. The materiality of the message itself is just as important as the content these writers are attempting to communicate. In Whitman’s prose, too, we begin to see an increase in em dash usage. It would be interesting to see whether or not the emergence of the postal card in the mid to late nineteenth century had any significant impact on Whitman’s postbellum writing. Continuing this project, I will try to incorporate some temporal dimension that could help track such developments.
In recent years, digital humanists have been at the forefront of challenging data’s supposed neutrality. Lisa Gitelman and Virginia Jackson have suggested that the discourse of objectivity that often surrounds conversations about data-drive research is not only reductive, but also unlikely to encourage future scholarship and more rigorous debate. They suggest instead that data be thought of as “situated and historically specific,” and that we recognize that “it comes from somewhere and is the result of ongoing changes to . . . conditions that are at once material, social, and ethical” (4). Indeed, just as words encode strings of meaning which can be ambiguous and open to interpretation, so too are numbers and databases invested with a rhetorical significance that must be tested and scrutinized. Information and the means by which it is assembled, organized, and presented must not be thought of in terms of self-evidence or a set of givens; rather, knowledge and the methods by which it is retrieved and made accessible should be open to interpretation.
Only recently have scholars begun to seriously explore the assumptions embedded in data visualization and graphical display. Johanna Drucker invites digital humanists to “take on the challenge of developing graphical expressions rooted in and appropriate to interpretative activity.” She takes issue with “realist” models of data visualization which appear to be motivated by the assumption that graphical displays and user interfaces show the phenomenon itself, rather than an interpretable representation of it. “Data,” she writes, “pass themselves off as mere descriptions of a priori conditions,” in turn foreclosing important discussions of ambiguity and uncertainty that could open up meaningful scholarly debate. Writing in 2011, Drucker felt that data visualizations concealed the very phenomena they were meant to expose. Meaningful insights were hidden behind an apparently “objective” digital representation, and more than five years later, I am left wondering the extent to which scholars of the digital have seriously confronted these problems.
At the same time, I’m equally interested in thinking about what existing visualizations can reveal about my own data. Rather than build a project from the ground up, how can I leverage, hijack, or appropriate the already-important work done by others in an effort to make it “fit” with my own data. Of course, such an open source or peer-to-peer mentality depends upon a number of important factors, not least among them being the ability to negotiate permissions for modifying someone else’s code. If all goes to plan, though, fitting data into existing visualizations can actually reveal unknowns without sacrificing the kind of interrogations Drucker images as being so essential to Digital Humanities Scholarship. At the very least, it has taught me how to creatively redeploy existing technologies and manipulate graphical representations for my own ends.
My project, Mapping Whitman’s Correspondence, is in many ways concerned with visualizing Whitman’s social network as it emerged in place and time. Importantly, the code I used to animate the trajectory of sent and received messages was adapted from a radically different project motivated by very different research objectives. University of Iowa professor Caglar Koylu actually wrote the code as part of his own scholarship on immigration patterns in Europe and the United States, and he was gracious enough to allow me to tinker with it in order to see what the code might reveal about my own data.
While my initial prototype utilized geocoding methods in order to visualize the flow of letters, much important information was nevertheless lost in the process. The correspondents themselves are most obviously absent from the visualization, and more nuanced data like the content of the letters, as well as frequent topics of conversation, were subordinated to an aesthetically pleasing though informationally limited animation.
Only recently have I started to focus more of my attention on thematic concerns, such as the often unacknowledged intersection of the poet’s public and private writing activity. What types of connections would I find between his letters and his most famous work, like Leaves of Grass. Are the boundaries separating public and private as rigidly maintained as we often believe? To explore these questions further, I began identifying all letters which contained references to Whitman’s poetry, prose, or journalistic activity. The more I worked with these documents, and the more I sensed the interconnections between them, it became clear to me that a more interactive interface was essential for helping users navigate the often overwhelming amount of data contained within the correspondence network.
I opted for what’s known as a concept map. In this iteration, the object consists of three parts: a table of “names” in the center, a list of “themes” on the left, and a cluster of “perspectives” on the right. Initially, I positioned individual correspondents in the middle, and the nodes which extended on either side contained references to Whitman’s writings. At first, this made the most sense to me in that the project is about the correspondence, so why not highlight the actual writers by situating them in the center of the “map.” After having met with digital humanities librarian Stephanie Blalock, though, it became clear to me that, while useful, what I had proposed was essentially a glorified finding aid. For example, users could very easily see what Whitman’s doctor friend, Richard Maurice Bucke, was most interested in talking about, but such discoveries revealed little else about the correspondence network.
What if I changed things up a bit? What would the concept map look like if I foregrounded gender as opposed to individual writers? What could this reveal about the nature of Whitman’s correspondence? Not only that, but what could this reveal about archival practice more generally, a potentially generative inquiry seeing as how all of my data is assembled from The Walt Whitman Archive. Perhaps I could begin to make inferences about the gendered nature of archival research, curation, and preservation in Whitman studies.
In the image above, you can see that I repositioned “themes” and “perspectives” in the middle of the map, with female correspondents now located on the left, and male correspondents on the right. While informative, doing so is not without its problems, and it is here where I return to Drucker’s observation regarding the assumptions embedded in visualizations themselves. Indeed, while the concept map does inherently preserve the notion of gender as a binary construct, it is also useful for conceptualizing the gendered identities that undergird epistolary activity. But whereas Drucker might consider digital repackaging to be fundamentally “at odds with humanistic method[s]” of interpretation and analysis, I see it as opening a dialogue where none might otherwise exist. If, as I believe, Whitman’s poetry and prose can be thought of as existing within a shared network of public and private activity, then it is important to consider the ways in which factors such as race, gender, and class contribute to such production.
Drucker, Johanna. “Humanities Approaches to Graphical Display.” Digital HumanitiesQuarterly, vol.
5, no. 1, 2011. Web. 3 July 2017.
Gitelman, Lisa and Virginia Jackson. “Introduction.” “Raw Data” Is an Oxymoron. Cambridge: MIT