In a blog post last week, I addressed Endangered Data Week and the history of political parties hiding, removing, or altogether abolishing public access to government documents. However, my post wasn’t alone in trying to shed light on this serious issue. In schools, universities, libraries, and classrooms across the world, hundreds of concerned people came together to bring awareness to the issue of endangered and disappearing data. And while Endangered Data Week is now over, the threat is not. So this week, I teamed up with the Digital Scholarship & Publishing Studio to highlight some of the excellent work currently being done by digital humanists and to provide some advice on how to get involved.
First, I visited with Tom Keegan, Head of the Digital Scholarship & Publishing Studio, and Matt Butler, the Studio’s Senior Developer, to discuss the services offered by university libraries to keep scholarly data safe. They stressed the import of digital institutional repositories in helping scholars to maintain their own data and make it accessible to others free of charge. The University of Iowa’s institutional repository, Iowa Research Online, houses an array of faculty, graduate, and undergraduate work. Librarians work closely with faculty, staff, and students to ensure these materials are properly archived and made available according to agreed upon standards. As I have pointed out before, non-university repositories like Academia.edu are for-profit and will indeed use your data in order to make them money.
Profit is a big factor to consider when thinking about where to put data. As Eric Kansa, founder of Open Context emphasized to me: “We need to maintain nonprofit (civil society) infrastructure to help maintain data (and backup internationally) during political crises. Organizations like the Internet Archive, and other libraries (including university libraries) are critical, because they have the expertise and infrastructure needed to maintain public records.” Kansa rightfully points out that libraries are integral to this fight, but notes that individuals need to know more about the vulnerability of data as well.
So, what do we do about all the government data (e.g. climate data) that is currently being pulled from government websites? This was just one question addressed by the group behind the formation of Endangered Data Week. Like most DH projects, EDW was forged by proactive academics who wanted to make a difference by using the biggest megaphone in the world: The Web. Michigan State University professor and digital humanist Brandon Locke, in collaboration with Jason A. Heppler, Bethany Nowviskie, and Wayne Graham, designed EDW on the model provided by Banned Books Week and Open Access Week. From there they brought the project to the Digital Library Federation‘s new interest group on Government Records Transparency/Accountability, directed by Rachel Mattson.
In order to find out more about this initiative and the problems they are addressing, I spoke to Bethany Nowviskie, Director of the Digital Library Federation (DLF) at CLIR and a Research Associate Professor of Digital Humanities, UVa. Prof. Nowviskie was kind enough to answer a number of questions I had about endangered data and how to get more involved in the fight to save it:
SB: Who owns federal data? In other words, should data be available to us because we pay taxes and fund data-producing institutions like HUD? The EPA? Why is the Executive in control of so much of this open data?
BN: Except where issues of personal privacy and cultural sensitivity are involved, data collected or produced by taxpayer-funded agencies of the federal government should be openly available to everyone. It’s a matter of transparency for the health of the republic — sunlight being, as they say, the best disinfectant — and of accountability of the government to its people. These are our datasets, and we should have the ability to analyze and build on them — using them to understand our world better, as it is, and to be able to *make it better.*
SB: How do we create a more centralized, non-profit infrastructure that can maintain data during political crises?
BN: Most pieces of our needed infrastructure are already in place. We call them libraries. The DLF will join a large number of allied groups in early May, convened by DataRefuge (our Endangered Data Week partner) and the Association of Research Libraries, to discuss a new “Libraries+ Network,” to take on exactly this issue: https://libraries.network/about/ Some questions that will motivate us: how can we create greater coherence among the many governmental, non-profit, and even commercial groups with longstanding commitments and expertise in particular areas of the data preservation enterprise? Might we re-energize and re-imagine something like the Federal Depository Library program for the digital age? What would it take for governmental agencies to implement data management plans for the full lifecycle of their information, just as researchers who receive federal funds are now typically required to do?
SB: What can regular non-specialists do to contribute?
BN: This is one reason DLF jumped at the chance to support grassroots efforts to organize the first annual Endangered Data Week. The goals expressed and audiences implied in our capsule summary (“raising awareness of threats to publicly available data; exploring the power dynamics of data creation, sharing, and retention; and teaching ways to make endangered data more accessible and secure”) go far beyond the professional research data management and data stewardship community. Probably the most useful thing a non-specialist can do is to educate herself on the issues and represent the value of open data legislation and the advances in open government we saw under the Obama administration to her representatives. We also need to urge follow-through on past bipartisan commitments in this sphere, such as the OPEN Government Data Act: https://www.datacoalition.org/open-government-data-act/
SB: Can you give some examples of digital projects or initiatives that depend on federal data to reveal racial inequity (e.g. redlining projects), bias, or certain dangers (e.g. lead poisoning)?
BN: Well, FOIA requests played an important role [in the Flint water crisis]— as they have done in Title IX enforcement on college campuses. In this sphere, I also think it’s worth mentioning that identical bills were recently introduced in both the House and Senate that would prohibit federal funds from being “used to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing.” [House Bill, Senate Bill]. They went nowhere, and were ostensibly meant to “protect local zoning decisions,” but *what is up with that?* This is the kind of thing that should energize non-specialist readers.
SB: How can we have trust in the integrity of datasets that have been given over to private institutions or saved by non-federal entities? In other words, who will hold the “control” copy (e.g. like a seed bank) that can assure us that datasets that have been saved were not then tampered with?
BN: So, there’s a huge professional community — many of them are DLF members or members of the National Digital Stewardship Alliance which we host — whose whole focus is on questions like this, and there are excellent protocols and procedures for ensuring data integrity. I’m not familiar enough with the ins and outs to give you a good quote, but it’s not a new problem, for sure, and methods for auditing and certifying digital repositories and verifying the integrity and security of the data within them are well established. As always, matters of policy, funding, and the professional development and nurturing of the communities who do the work are a bigger challenge than the technology!
Bethany’s comments above echo what others on campuses across the US are saying: data is a resource. Like water or electricity, access to it ought not be taken for granted. We must continue to be vigilant in the face of lazy and aggressive attitudes, alike. Libraries and library associations remain a big part of the fight to preserve this data, but all of us can play a part by being more aware, spreading the word, and getting involved in the movement.