By James M. DuBois, DSc, PhD
There is currently nothing to prevent qualitative researchers from dumping data that are sensitive and potentially identifiable in an open access data repository. Doing so comes with rewards: An instant DOI proving data have been shared, thus satisfying the requirements of many journals and funding agencies and eliminating delays in publication. In contrast, carefully de-identifying qualitative data and executing restricted use agreements can take significant amounts of time.
In 2023, the National Institutes of Health (NIH) mandated broader data sharing. For the first time, this included qualitative research data, for example, data from interviews and focus groups. These data are often quite sensitive. Qualitative research is conducted to gain in-depth knowledge of personal experiences. Qualitative research data may include recordings or transcripts of conversations about health, sex practices, illegal behaviors, and third parties (e.g., family members, partners, or service providers). De-identifying such data can be a challenge.
In 2017, our team received an NIH R01 grant (R01HG009351) to develop resources to support responsible qualitative data sharing. We worked with 28 researchers to assist them in de-identifying and depositing data with the ICPSR repository. Drawing on lessons learned from this project, in this brief article we share a few questions that IRB office and board members may want to consider as they review qualitative research proposals in the era of data sharing mandates. It is important for IRBs to provide leadership in this area to protect both research participants and institutions. Our survey of over 400 qualitative researchers found that almost no researchers in the U.S. have experience sharing qualitative data. In another study, we found that very few data repositories have guidelines for depositing qualitative data. Researchers who wish to share qualitative data in a responsible manner may require institutional guidance. The following questions will help IRBs to consider some of the most crucial issues that need to be addressed at the outset of a study.
- Who is funding the study, and where will findings be published? NIH currently requires qualitative data to be shared, and a recent White House memorandum to all federal funding agencies states the expectation that they too will require immediate availability of federally funded research data by the end of 2025. Many journals also require that data be deposited prior to publishing data. Whereas qualitative data were often given a free pass, this is less common today. Nevertheless, the issue of data sharing sharply divides qualitative researchers: Nearly half support data sharing, while the same oppose data sharing. Some oppose data sharing so strongly, they say they will not accept funding from agencies or publish in venues that require data sharing.
- What data will be shared? Ordinarily, we recommended sharing basic study materials (such as protocols and interview guides), codebooks, and transcriptions of recordings. We generally recommend against sharing audio or video recordings—unless there is consent to do so, and data are not likely to cause harm—because voiceprints and facial images are HIPAA safe harbor identifiers. Moreover, most researchers analyze transcribed data, thus, sharing data in this form easily permits verification of reported findings and secondary data analysis. Some researchers may attempt to share only excerpts from transcripts—those key quotes that they coded. However, doing so fulfills neither of the two main purposes of data sharing: It does not permit someone to verify their findings, or to conduct secondary research with the data.
- What does the consent form disclose regarding data sharing? In the past, consent forms for qualitative research studies often sought to reassure participants and IRBs with language such as this: “No one outside of the research team will see your data. Interviews will be transcribed and then recordings will be erased.” Such language of course presents an obstacle to data sharing. It is also unnecessary to reassure participants. Our own interviews with participants, and review of the literature, have found that most participants support sharing deidentified data with other researchers. But it may be important to tell participants that data will be de-identified and whether access to data will be restricted (e.g., to researchers with an approved study). IRBs may want to consider whether their informed consent form templates provide researchers with adequate discretion to express such details.
- What counts as de-identified data? The HIPAA Privacy Rule offers two standards for de-identifying data. The first is the safe harbor approach, which involves removing all instances of specific variables, such as full names, birth dates, voiceprints, email addresses, addresses, and phone numbers. Qualitative data rarely include such information, and when such information appears it is usually fairly easy to identify and remove. However, this standard may not suffice to genuinely de-identify qualitative data. Qualitative data may include statements that directly identify an individual such as “I was CEO of Company X in 2019…” Or by identifying the study site and an individual’s sex, gender, and profession, one might easily infer the identity of a participant (e.g., if a participant is the only male, Hispanic, psychiatric nurse in town). We therefore encourage the use of both the HIPAA safe harbor approach and the expert determination approach, which requires an expert (perhaps the researchers themselves) to make a determination “that the risk is very small that the information could be used, alone, or in combination with other reasonably available information, … to identify an individual who is a subject of the information.” Our project developed software that assists researchers in flagging and reviewing HIPAA safe harbor and other potentially identifying variables. At least two data repositories—the QDR at Syracuse and the ICPSR at University of Michigan—have data curators who review the quality of data deidentification. They may also help to ensure that information is not unnecessarily removed from transcripts, which may reduce the value of the data for secondary analysis. [Disclosure: The author was a developer of the QuaDS software (now called De-ID) and may receive royalties from licensing.]
- What protections will the data repository offer? Not all data repositories offer the same services. The most basic services involve permitting researchers to upload data to secure servers. Some repositories offer data curation which may involve, among other things, assessing how sensitive and how de-identified data are, generating keywords to enhance discoverability, and ensuring data are formatted and labeled in ways that support secondary use. Other repositories offer restricted access, which commonly requires secondary data users to apply to access data. The secondary use application may involve providing credentials, IRB approval, and a data protection plan. Many repositories are not set up to provide these services to qualitative researchers; yet these services can provide valuable protection of sensitive data.
- Are there community partners who will have a say in data sharing plans? Qualitative research sometimes involves partnerships with communities. Although data typically belong to institutions that fund, or receive funding for, research projects, communities may insist on some level of control over how data are used and shared. In some cases, e.g., research with American Indian/Native American tribes, data may belong to the tribe or be subject to tribal oversight. NIH acknowledges these tribal rights and has issued guidance for researchers. Per guidance issued thus far, NIH appears supportive of restricting access to data as an additional form of protection but appears less willing to waive data sharing requirements.
In our experience, IRBs are rarely directly involved in the oversight of data sharing: Data sharing is commonly mentioned in consent forms, and data are normally de-identified prior to sharing, rendering the data no longer human subjects data. As the questions above illustrate, things are not as straightforward with qualitative data sharing. Some researchers may have thoughtful data sharing plans—written for funding agencies—that they can share; but at this point in time, many will not. Hence, asking these questions is important to ensure that consent processes, de-identification strategies, and deposit plans adequately protect participants in qualitative research.
Our team has produced a toolkit that supports responsible qualitative data sharing: https://qdstoolkit.org. This toolkit includes references to our published findings, guidance on data sharing, and information about software we developed to assist researchers in data sharing. Our recently published article in the Proceedings of the National Academies of Science provides further information on some of the most common challenges researchers face in sharing qualitative data.
James M. DuBois, DSc, PhD is the Executive Director of the Bioethics Research Center at the Washington University School of Medicine.