On October 10, the National Institutes of Health (NIH) published a Request for Information (RFI) from the public on Proposed Provisions for a Draft Data Management and Sharing Policy for NIH Funded or Supported Research. According to the NIH, it will take into account the feedback received as it develops a draft of a new NIH policy for data management and sharing. That draft will also be open for public comment, at a future date. Comments on this first stage of policymaking, the proposed provisions, are due December 10, 2018.
The document provides key definitions and outlines the purpose and scope of any new NIH policies on data management sharing, with the bulk of the document dedicated to outlining potential requirements for data management and sharing plans that would have to be included with all applications and proposals for NIH-funded or-supported research projects. The NIH is specifically seeking feedback on the following three areas (though they also welcome comments on any aspect of the proposal): 1) Their definition of” scientific data” for the purposes of a sharing policy; 2) The requirements for data management and sharing plans; and 3) optimal timing for implementation of a new NIH policy in this area.
On December 3, 2018, PRIM&R submitted comments in response to NIH’s RFI, expressing that we fully support initiatives that promote broad data sharing for many of the same reasons the NIH articulates. Data sharing can optimize the use of scarce research resources and accelerate science and its application to human health. Sharing data from human subjects research not only honors those subjects’ contributions by maximizing the value of their involvement, it also helps minimize risks to research subjects by leading to better-designed and safer future research.
However, we note that while data sharing has many benefits, it also inherently involves risks, most notably, to privacy and confidentiality. Though data sets may include only “de-identified” personal data, research often involves aggregating multiple data sets, thereby increasing the chances that individuals will be inadvertently re-identified. Most experts agree that at present no data should be considered permanently de-identified.
We believe the NIH’s draft provisions fall short on acknowledging and grappling with these realities. In our comments, we urge the NIH to continue to lead the way on responsible data sharing by articulating the tradeoffs between maximizing the value of scientific data and protecting the rights and interests of research subjects, and by providing guidance on best practices for responsible data sharing given those tradeoffs—both of which may in turn enhance public trust in the data sharing enterprise.
In light of this general point, our comments highlight specific areas for NIH to address as it drafts data sharing and management policies. We comment on both the proposed definition of scientific data and the proposed requirements for data management and sharing plans, and then raise several additional points for the NIH to consider.
- The definition of “scientific data”
With respect to the proposed definition of “scientific data,” we suggest the definition requires clarification on three fronts: 1) whether it is meant to apply to qualitative as well as quantitative data, and the implications of a one-size-fits-all model given the unique challenges associated with qualitative methods; 2) whether and how the definition is meant to cover new data analysis that is generated after the end of the NIH grant; and 3) whether the definition, which is very broad, allows for multiple interpretations, thereby increasing the likelihood that more identifiable information will be shared than is intended by the policy.
- The requirements for data management and sharing plans
The NIH draft provisions makes clear that data management and sharing plans should provide for the broadest use of data, “consistent with privacy, security, informed consent, and proprietary issues.” However, the agency provides little detail about what those issues are or how they should be addressed. We ask the NIH to provide more guidance for the community on how to craft data management and sharing plans that adequately address such concerns.
For example, the NIH should provide guidance for IRBs and other stakeholders on how to determine what retrospective uses of existing data, including data sharing, would be ethically appropriate when consent is not specific about, or is silent on, future uses. We suggest in such guidance NIH be explicit about considerations that should come into play when making these decisions, such as the characteristics of the study population, the sensitivity of the data, the likelihood of re-identification, and the scientific utility of the data itself. We also suggest NIH provide more guidance on best practices for sharing data in ways that are consistent with privacy and confidentiality standards, including guidance on when it is reasonable to place restrictions on data use and sharing. And we urge NIH to encourage, if not require, data management and sharing plans to include provisions regarding how research subjects will be informed about the limitations of current technologies to completely de-identify their data while preserving that data’s utility for research.
- Other considerations
Additionally, we note a few concerns about the draft provisions. First, we ask NIH to consider how it can ensure that institutions with fewer resources are able to follow through and execute their data management and sharing plans. We also suggest that future policies expand on the rather thin compliance and enforcement provisions proposed. We note that, as data sharing becomes more prevalent, the public will increasingly demand consequences (beyond just the agency rescinding funding) when their data are not shared with adequate attention to protections. Finally, we urge the NIH to use its policies to encourage standardization across data repositories.
We conclude our comments by pointing out that, as a community, we collectively still have a lot to learn about data sharing and until we fully understand the risks and benefits, we urge the NIH to continue to monitor the successes and failures of methods used to protect people’s privacy and enhance their welfare and to incorporate what is learned into its own policies.
What do you think of the NIH’s draft provisions for future data management and sharing policies? This is an important opportunity for the community to weigh in on the future direction of NIH policies in this area, so I encourage you to consider submitting your own comments. Feel free to cite PRIM&R’s comments or borrow any of the points we make, if that would be useful to you. Please let us know in the comment section below if you take such an approach. You can submit comments by going to NIH’s portal and either responding to the three prompts, or uploading a document.
As others have observed, once data is released it cannot be taken back. The proposed NIH policy speaks to the importance of making scientific data accessible, but it is effectively silent on the implications of aggregation across data sets, which may allow re-identification and the generation of new information about individuals. I don’t believe there is a single approach to address this concern, but, agree with PRIM&R’s comments that it is naive not to explicitly address the issue.