{"id":5653,"date":"2018-07-24T21:38:37","date_gmt":"2018-07-25T02:38:37","guid":{"rendered":"https:\/\/blog.lib.uiowa.edu\/studio\/?p=5653"},"modified":"2018-07-25T09:11:08","modified_gmt":"2018-07-25T14:11:08","slug":"woah-data-prep-for-network-analysis-of-research-topics","status":"publish","type":"post","link":"http:\/\/blog.lib.uiowa.edu\/studio\/2018\/07\/24\/woah-data-prep-for-network-analysis-of-research-topics\/","title":{"rendered":"WOAH &#8211; Data Prep for Network Analysis of Research Topics"},"content":{"rendered":"<p><span style=\"font-weight: 400\">Some thoughts on methods and tools.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Here is the process that I have used to analyze the WOAH database as a network. Let&#8217;s start with a sample entry:<\/span><\/p>\n<p><a href=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Aneilya_Barnes.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-5657 size-full\" src=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Aneilya_Barnes.jpg\" alt=\"\" width=\"930\" height=\"124\" srcset=\"http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Aneilya_Barnes.jpg 930w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Aneilya_Barnes-300x40.jpg 300w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Aneilya_Barnes-768x102.jpg 768w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Aneilya_Barnes-640x85.jpg 640w\" sizes=\"(max-width: 930px) 100vw, 930px\" \/><\/a><\/p>\n<p><a href=\"http:\/\/woah.lib.uiowa.edu\/explore\/\"><span style=\"font-weight: 400\">http:\/\/woah.lib.uiowa.edu\/explore\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">I point to the 5th field, \u201cWomen in the\u2026\u201d and begin the data collection and conversion process there. Entries for this field vary in length from 0 to 238 words in at least 3 languages (though mostly English). I supplement these words with the \u201cAssociated Subjects\u201d list from each woman\u2019s Worldcat Identities entry. Here are Dr. Barnes\u2019:<\/span><\/p>\n<p><a href=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_WI.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-5655 size-full\" src=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_WI.jpg\" alt=\"\" width=\"960\" height=\"85\" srcset=\"http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_WI.jpg 960w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_WI-300x27.jpg 300w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_WI-768x68.jpg 768w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_WI-640x57.jpg 640w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/www.worldcat.org\/identities\/lccn-no2017130793\/\"><span style=\"font-weight: 400\">https:\/\/www.worldcat.org\/identities\/lccn-no2017130793\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">The goal of this step is to both help add points of contact between scholars and to fill some details into more specialized topics. These additions are particularly helpful when a scholar has nothing in the \u201cResearch Interest\u201d field but lots of publications. Of course, plenty of women don\u2019t have Worldcat Identity entries yet, and the number of words tends to be related to how many publications a scholar has. Still, between the original entry and the Worldcat supplements, woman had some words to work with.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">It is worth noting at this stage that neither the \u201cResearch Interest\u201d field nor the \u201cAssociated Subjects\u201d are definite, uniform and authoritative. Common names seem to confuse Worldcat, take Dr. Anna Clark of Christ Church Oxford:<\/span><\/p>\n<p><span style=\"font-weight: 400\">WOAH<\/span><\/p>\n<p><a href=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Anna_Clark.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-5658 size-full\" src=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Anna_Clark.jpg\" alt=\"\" width=\"934\" height=\"78\" srcset=\"http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Anna_Clark.jpg 934w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Anna_Clark-300x25.jpg 300w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Anna_Clark-768x64.jpg 768w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/Anna_Clark-640x53.jpg 640w\" sizes=\"(max-width: 934px) 100vw, 934px\" \/><\/a><\/p>\n<p><a href=\"http:\/\/woah.lib.uiowa.edu\/explore\/\"><span style=\"font-weight: 400\">http:\/\/woah.lib.uiowa.edu\/explore\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Worldcat Identities<\/span><\/p>\n<p><a href=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Clark_WI.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-5656 size-full\" src=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Clark_WI.jpg\" alt=\"\" width=\"952\" height=\"203\" srcset=\"http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Clark_WI.jpg 952w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Clark_WI-300x64.jpg 300w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Clark_WI-768x164.jpg 768w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Clark_WI-640x136.jpg 640w\" sizes=\"(max-width: 952px) 100vw, 952px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/www.worldcat.org\/identities\/lccn-nr2005018755\/\"><span style=\"font-weight: 400\">https:\/\/www.worldcat.org\/identities\/lccn-nr2005018755\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">This word list appears to include topics found in books by 3 or 4 entirely different authors with very similar names. I have not yet cleaned out the irregularities, hoping that irregular words like \u201cFlorida\u201d will not find a connection in another entry. As WOAH grows and continues to expand outside of Ancient Mediterranean historians, more careful steps will be necessary. In this case, Worldcat is adding a lot of noise or misinformation (Dr. Clark may make a number of false connections through \u201cMilitary\u201d, \u201cArmy\u201d and \u201cSoldiers\u201d) to gain \u201cCults\u201d, \u201cHistoriography\u201d, \u201cRome\u201d and \u201cEmpire\u201d (and \u201cEmpire\u201d might be misleading too, compared to her focus on the late Roman Republic).<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Aware of such problems, let\u2019s move on. I used R\u2019s tm package to remove stop words and stem the words: <\/span><\/p>\n<p><span style=\"font-weight: 400\">(Dr. Barnes again)<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Women in the early church, material culture, topography, gender, late antiquity, Roman domestic space, Roman games, identity and empire, religion, Christianization of Rome Mediterranean Region Signs and symbols Symbolism<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">-becomes-<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">women earli church materi cultur topographi gender late antiqu roman domest space roman game ident empir religi christian rome mediterranean region sign symbol symbol<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">I use OpenRefine to change the shape of the data into an edge list, then make it numerical for Gephi:<\/span><\/p>\n<p><a href=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_GS.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-5654\" src=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_GS-300x259.jpg\" alt=\"\" width=\"300\" height=\"259\" srcset=\"http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_GS-300x259.jpg 300w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/A_Barnes_GS.jpg 602w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400\">Gephi uses some version of the Louvain algorithm to find \u2018communities\u2019, or clusters, in a network. These clusters are the categories which I have displayed in the following map.<\/span><\/p>\n<p><a href=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/WOAH_map_713.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-5659\" src=\"https:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/WOAH_map_713-300x159.jpg\" alt=\"\" width=\"300\" height=\"159\" srcset=\"http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/WOAH_map_713-300x159.jpg 300w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/WOAH_map_713-768x406.jpg 768w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/WOAH_map_713-640x338.jpg 640w, http:\/\/blog.lib.uiowa.edu\/studio\/files\/2018\/07\/WOAH_map_713.jpg 933w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/geog3540.github.io\/woah\/testing\/clusterpies\/713.html\"><span style=\"font-weight: 400\">https:\/\/geog3540.github.io\/woah\/testing\/clusterpies\/713.html<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Tomorrow, I will describe where the labels for the clusters come from, some challenges and experiments with clustering algorithms and using topic modeling on the stemmed and stop-word-removed texts.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Feedback is always welcome,<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">Ed Keogh<\/span><\/p>\n<p><span style=\"font-weight: 400\">edward-keogh@uiowa.edu<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some thoughts on methods and tools. &nbsp; Here is the process that I have used to analyze the WOAH database as a network. Let&#8217;s start with a sample entry: http:\/\/woah.lib.uiowa.edu\/explore\/ &nbsp; I point to the 5th field, \u201cWomen in the\u2026\u201d and begin the data collection and conversion process there. Entries for this field vary in<a class=\"more-link\" href=\"http:\/\/blog.lib.uiowa.edu\/studio\/2018\/07\/24\/woah-data-prep-for-network-analysis-of-research-topics\/\">Continue reading <span class=\"screen-reader-text\">&#8220;WOAH &#8211; Data Prep for Network Analysis of Research Topics&#8221;<\/span><\/a><\/p>\n","protected":false},"author":38,"featured_media":5479,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,32],"tags":[],"syndication":[30,21],"_links":{"self":[{"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/posts\/5653"}],"collection":[{"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/users\/38"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/comments?post=5653"}],"version-history":[{"count":1,"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/posts\/5653\/revisions"}],"predecessor-version":[{"id":5660,"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/posts\/5653\/revisions\/5660"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/media\/5479"}],"wp:attachment":[{"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/media?parent=5653"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/categories?post=5653"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/tags?post=5653"},{"taxonomy":"syndication","embeddable":true,"href":"http:\/\/blog.lib.uiowa.edu\/studio\/wp-json\/wp\/v2\/syndication?post=5653"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}