My digital capstone experience has been a learning experience in the best possible way. That is to say that, while I didn’t accomplish nearly as much as I had hoped to get done, I did learn a great deal which will make my digital humanities work more effective and efficient in the future. Cleaning the data for my future textual analysis proved to be a bigger ordeal than I had anticipated, and we determined pretty quickly that I needed to boost my coding skills before moving on to the next phase of the project.
Early in the project, we decided that I should create an index for all the files which make up the corpus I will be analyzing in the very near future. Given the scope of the project, Nikki and Matthew recommended using Python scripts to extract all the relevant metadata from the files in my corpus. As a coding novice, this was a bit of a daunting task, but one which I relished because it felt like my first foray into “doing digital humanities.” Initially, I struggled a good deal with designing the script. Following the friendly advice of my Studio contacts, I searched through several online code repositories for scripts which would extract the metadata I wanted and output it into a .csv file which could then be turned into a Microsoft Excel spreadsheet. Following several unsuccessful attempts at modifying other people’s code to work on my files, I became frustrated and decided to write my own script from the ground up. Unfortunately, my coding skills were not equal to my ambitions, and I failed again.
In a moment of weakness, I turned to the dark side and asked OpenAI’s ChatGPT what exactly was wrong with my code. After a few hours of dialogue with the notorious artificial intelligence interface, my code issue was fixed. Or so it seemed. My script worked—on about half of the files in my corpus. It turns out that the other half of the files in my corpus were in a vastly different XML format. It seemed easy enough to modify the script ChatGPT had provided me. Surely just a little nip and tuck on the code would see me across the finish line of this crucial first step. Unfortunately, this was not the case, as neither myself nor my AI compatriot seemed up to the task of extracting metadata from these strangely formatted files. After an unsuccessful attempt at workshopping my code with Nikki, we jointly decided it would be in my best interests to say goodbye to my AI buddy and learn the basics of coding in Python myself. It felt a bit like starting over, but it also felt like a fresh start was needed.
I enrolled in a coding tutorial through Linked-In Learning, after balking at the cost of classes provided through Code Academy. These tutorials have helped me to acquire the basic vocabulary and proficiencies I will need for the project ahead. Eventually, Nikki was able to solve my scripting issue, and my index began to take shape. Currently, I am manually adding data to that index—data which cannot be extracted by the script, such as genre, and the approximate date which the texts were written. These categories do not exist in the metadata, and they are also notoriously difficult categories to define with medieval texts which often fit into multiple genres and have no exact “publication date.” Despite my seeming lack of success this semester, I am extremely optimistic about the future of this project. I was fortunate to have been chosen for a Digital Scholarship & Publishing Summer Fellowship, where I will continue the work I’ve done this semester and, hopefully, move on to the textual analysis and data visualization portions of the project.