Beginning in January 2023 the National Institutes of Health (NIH) will require data management and sharing plans for all awards that generate scientific data. Research Data Services at the University of Iowa Libraries has been supporting university researchers as they prepare for this change, with guidance on our website, informational workshops, and individual consultations.
With this new requirement, the NIH shows its strong support of data sharing, though the inevitable questions of “why?” and “what’s the point?” have been top of mind for many. So, what are the benefits of sharing data?
Research shows that data sharing has many advantages – for researchers, for the patients whose data is being shared, and for society more generally.
- For researchers: A 2020 study published in PLOS ONE shows a significant advantage in citation numbers when data is shared in open access repositories. The study authors reviewed well over 500,000 articles published by PLOS and BMC and found up to 25.36% increase in citation counts when the associated data was fully open access – i.e., in a repository without restrictions on its use [1]. And sharing data also offers exciting opportunities for new research studies. For instance, analyzing old data with contemporary methods has led to novel “remixes,” such as a study that used weather reports in 19th century naval logbooks to model climate change, or one that analyzed text comments from patients in online forums to help clinical discovery [2].
- For patients: Many have argued for the ethical obligation researchers have to make full use of clinical trial data, given that patients have donated their time and information in the pursuit of treatments and therapies [3]. Sharing clinical data widely could lead to the design of new trials, development of predictive models, and the creation of simulation tools [3]. Cancer researchers, specifically, have argued that because precision oncology is so heterogenous, no single research center can possibly produce enough data to adequately use models for prognosis and prediction. Data simply must be shared so there is enough of it to train the models and ultimately enhance patient outcomes [4].
- For society at large: When those in developed nations with robust research infrastructure share their data openly, it allows those with limited resources – especially those in low-income countries – to use the data to investigate questions that are relevant to their own communities’ needs [2] [5]. Plus, there are major economic benefits to open data. A 2018 study by the European Commission found an estimated annual cost of €10.2 billion to the European economy when data is not shared [6]. The costs derive from inefficiency (especially wasted time), paying for extra licenses to access data, and storage costs.
Care must certainly be taken to minimize the risks of reidentification and other factors when sharing human subject data. Deidentification and sharing data through restricted access repositories are some of the ways in which this can be addressed.
In fact, participants in clinical trials are strong advocates for data sharing. A 2018 survey of participants indicated that they perceive the benefits of data sharing to outweigh the downfalls, and they noted that the most important advantage of data sharing was “making sure people’s participation in clinical trials leads to the most benefit possible” [7].
This belief in the value of data sharing actually extends beyond clinical trial participants. Pew Research found most Americans (57%) say they would trust research findings more if researchers made their data publicly available [8].
Ultimately, many stakeholders can expect to derive real benefits from the new NIH policy and increased data sharing, and researchers’ data management and sharing plans can lead to valuable contributions to the advancement of science and health outcomes.
If you’re tasked with writing your own data management and sharing plan, identifying a repository for sharing your data, or curating a dataset, Research Data Services can help. We have guidance on our website, offer workshops and trainings, do one-on-one consultations, and can answer your questions by email, as well.
References:
[1] Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K., and McGillivray, B. (2020). “The citation advantage of linking publications to research data,” PLOS ONE, vol. 15, no. 4, 2020 pp. 1–18, doi: 10.1371/journal.pone.0230416.
[2] Voytek, B. (2016). “The virtuous cycle of a data ecosystem.” PLOS Computational Biology, vol. 12, no. 8, doi: 10.1371%2Fjournal.pcbi.1005037
[3] Konkol, M., Nüst, D., and Goulier, L. (2020). “Publishing computational research – a review of infrastructures for reproducible and transparent scholarly communication,” Research Integrity and Peer Review, vol. 5, no. 1, p. 10, doi: 10.1186/s41073-020-00095-y.
[4] Vesteghem, C. et al. (2020). “Implementing the FAIR Data Principles in precision oncology: Review of supporting initiatives,” Briefings in Bioinformatics, vol. 21, no. 3, pp. 936–945, doi: 10.1093/bib/bbz044.
[5] American Psychological Association (2015). “Data Sharing: Principles and Considerations for Policy Development.” https://www.apa.org/science/leadership/bsa/data-sharing-report
[6] PwC EU Services, Cost of not having FAIR research data. Cost-Benefit analysis for FAIR research data. 2018. doi: 10.2777/02999.
[7] Mello, M. M., Lieou, V., and Goodman S.N., (2018). “Clinical trial participants’ views of the risks and benefits of data sharing,” New England Journal of Medicine, vol. 378, no. 23, pp. 2202–2211, doi: 10.1056/NEJMsa1713258.
[8] Funk, C., Hefferon, M., Kennedy, B., and Johnson, C. (2019) “Americans say open access to data and independent review inspire more trust in research findings,” Pew Research Center, https://www.pewresearch.org/science/2019/08/02/americans-say-open-access-to-data-and-independent-review-inspire-more-trust-in-research-findings/