{"id":5342,"date":"2017-08-08T22:53:35","date_gmt":"2017-08-09T03:53:35","guid":{"rendered":"https:\/\/blog.lib.uiowa.edu\/studio\/?p=5342"},"modified":"2018-10-31T13:49:49","modified_gmt":"2018-10-31T18:49:49","slug":"periodizing-big-data-reintroducing-punctuation-back-into-corpus-analysis","status":"publish","type":"post","link":"https:\/\/blog.lib.uiowa.edu\/studio\/2017\/08\/08\/periodizing-big-data-reintroducing-punctuation-back-into-corpus-analysis\/","title":{"rendered":"Periodizing Big Data: Reintroducing Punctuation Back Into Corpus Analysis"},"content":{"rendered":"<figure id=\"attachment_5345\" aria-describedby=\"caption-attachment-5345\" style=\"width: 191px\" class=\"wp-caption alignright\"><a href=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/letterspunct.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-5345\" src=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/letterspunct-155x300.png\" alt=\"\" width=\"191\" height=\"371\" srcset=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/letterspunct-155x300.png 155w, https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/letterspunct.png 366w\" sizes=\"(max-width: 191px) 100vw, 191px\" \/><\/a><figcaption id=\"caption-attachment-5345\" class=\"wp-caption-text\">Here is a snapshot of Whitman\u2019s correspondence as it is represented by punctuation only. To achieve this effect, I ran a python script which moved all of these elements into a separate .txt file. This was an important step when considering the character count of these documents. Word counts alone could exceed upwards of 300,000, and Microsoft Word was simply not equipped to handle calculations on such a large scale. Removing just the commas (18,000 in the correspondence alone) it would take nearly 30 minutes, sometimes longer.<\/figcaption><\/figure>\n<p>When it comes to corpus analysis, scholars have tended to focus on stylistic or linguistic patterns in an author\u2019s work.\u00a0 Punctuation is often excluded from these conversations, yet it is not entirely clear as to why this is the case.\u00a0 Periods, commas, hyphens, etc., are meaningful units of expression, and they can typically serve as a kind of signature by which to identify an author\u2019s more nuanced expression.\u00a0 Not only that, but they can tell us important things about the social, cultural, and historical conditions under and through which a text was produced.<\/p>\n<p>All too often, though, these elements of language are the first things to go when using digital software to analyze big data.\u00a0 The tutorials that I have found often view punctuation as entirely expendable units of expression.\u00a0 The same is true of stopwords, or those elements of language which most programs seem to categorize as \u201cnonessential.\u201d\u00a0 Here, I am thinking of the ways in which word clouds are generated according to a \u201cweighted\u201d vocabulary the program deems more important than others.\u00a0 Moreover, the same appears true in most sentiment analyses which pull data from a lexicon containing \u201csignificant\u201d words with clear positive and negative connotations.<\/p>\n<p>For these last few weeks of the summer fellowship, I have been exploring those marginal aspects of language which are often either forgotten or intentionally excluded from most datasets.\u00a0 It was my feeling that a computational analysis of Whitman\u2019s punctuation offers important insights into the ways in which different media formats influenced his work.\u00a0 The comma, for example, serves many functions: it organizes parts of a list, joins together different ideas, and it can even act as a surrogate for other words.<\/p>\n<p><a href=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/Dashboard-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-5344 alignleft\" src=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/Dashboard-1-300x196.png\" alt=\"\" width=\"429\" height=\"280\" srcset=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/Dashboard-1-300x196.png 300w, https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/Dashboard-1-768x502.png 768w, https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/Dashboard-1-1024x670.png 1024w, https:\/\/blog.lib.uiowa.edu\/studio\/files\/2017\/08\/Dashboard-1.png 1608w\" sizes=\"(max-width: 429px) 100vw, 429px\" \/><\/a>As revealed in the graph, the comma is among Whitman\u2019s most-used punctuation.\u00a0 By itself this is perhaps not a revelatory statement.\u00a0 I\u2019d bet that the comma is probably the most-used punctuation mark in the entire English language.\u00a0 How exactly Whitman employs it in his poetry, prose, and correspondence is worth investigating further.\u00a0 The same goes for other punctuation marks.\u00a0 In Whitman\u2019s correspondence, an increase in em dash usage is particularly noteworthy.\u00a0 Working as a research assistant for the <em>Walt Whitman Archive<\/em>, much of what I do consists of transcribing and encoding these messages.\u00a0 What I have found is that postal cards in the nineteenth century, even more so than traditional \u201cletters,\u201d contain a tremendous amount of em dashes.\u00a0 A number of reasons can explain this, but the most compelling to me is the smaller physical size of these messages.\u00a0 The materiality of the message itself is just as important as the content these writers are attempting to communicate.\u00a0 In Whitman\u2019s prose, too, we begin to see an increase in em dash usage. \u00a0It would be interesting to see whether or not the emergence of the postal card in the mid to late nineteenth century had any significant impact on Whitman\u2019s postbellum writing.\u00a0 Continuing this project, I will try to incorporate some temporal dimension that could help track such developments.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When it comes to corpus analysis, scholars have tended to focus on stylistic or linguistic patterns in an author\u2019s work.\u00a0 Punctuation is often excluded from these conversations, yet it is not entirely clear as to why this is the case.\u00a0 Periods, commas, hyphens, etc., are meaningful units of expression, and they can typically serve as<a class=\"more-link\" href=\"https:\/\/blog.lib.uiowa.edu\/studio\/2017\/08\/08\/periodizing-big-data-reintroducing-punctuation-back-into-corpus-analysis\/\">Continue reading <span class=\"screen-reader-text\">&#8220;Periodizing Big Data: Reintroducing Punctuation Back Into Corpus Analysis&#8221;<\/span><\/a><\/p>\n","protected":false},"author":210,"featured_media":5134,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,32],"tags":[],"syndication":[30,21],"_links":{"self":[{"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/posts\/5342"}],"collection":[{"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/users\/210"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/comments?post=5342"}],"version-history":[{"count":5,"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/posts\/5342\/revisions"}],"predecessor-version":[{"id":5349,"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/posts\/5342\/revisions\/5349"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/media\/5134"}],"wp:attachment":[{"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/media?parent=5342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/categories?post=5342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/tags?post=5342"},{"taxonomy":"syndication","embeddable":true,"href":"https:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/syndication?post=5342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}