Big Data and Research Ethics: Reflections on The Petrie-Flom Center’s Conference, “Big Data, Health Law, and Bioethics”

In May, members of PRIM&R’s staff attended The Petrie-Flom Center’s Conference, “Big Data, Health Law, and Bioethics” at Harvard Law School. Several of the panel presentations addressed what the move towards big data means for the research ethics community. To recap the event for those unable to attend, we begin with a summary of some of the privacy concerns about big data, and then highlight several panel presentations that explored what these concerns mean for human subjects and IRBs.

Catherine M. Hammack of the Program for Empirical Bioethics at Duke University’s School of Medicine presented the preliminary findings of their study looking at the attitudes of thought leaders about the risks and harms involved in precision medicine research. The Duke researchers developed a “hypothetical big data study” that was similar to the Precision Medicine Initiative, in that it involved whole genome sequencing, pulling information from electronic health records, and collecting data from mobile devices. They conducted 45 interviews, including with individuals in the human subjects protection field, in a living room environment so the subjects would not feel constrained by their regulatory and IRB experiences when analyzing the hypothetical study.

The interview subjects identified improving public health as being the key benefit of such a study. However, many were worried about the risks of reidentifying subjects from the study’s data. They were concerned that the hypothetical database might be hackedor that subjects might be reidentified through triangulation. For example, a subject might be re-identified if a third party had information such as a zip code and a medical device number from other data sources.

Ameet Sarpatwari of Harvard Medical School and Brigham and Women’s Hospital argued that de-identification concerns should be considered “in context.” He pointed out that data de-identified in accordance with the Health Insurance Portability and Accountability Act (HIPAA) is less susceptible to attacks and re-identification. (The HIPAA Privacy Rule’s Safe Harbor pathway considers Protected Health Information (PHI) to be de-identified once 18 identifiers are removed.) For example, the Department of Health and Human Services (DHHS) issued a challenge whereby they gave the public 15,000 records with HIPAA de-identified data, and only two people were re-identified (pdf).

Sarpatwari also discussed the benefits of reviewing big data for post-approval drug and device research. Pre-approval studies may exclude key populations such as children, women, and ethnic minorities. In addition, pre-approval studies may not reveal all the adverse events associated with a drug or device. However, if researchers can review big data after a drug or device has been approved and marketed, they may be able to see how a drug or a device works in the aforementioned populations or to detect rare but serious adverse events. One way researchers might review big health data is to link insurance claims with electronic medical records. Many of the owners of such data fall under HIPAA, meaning the de-identification standards would apply.

As discussed above, there is a very small risk that even HIPAA de-identified data may be re-identifiable. Thus, Sarpatwari suggested that we need to determine what risks of reidentification we are comfortable with, given the potential benefits of research with large data sets. Sarpatwari and his co-authors recommend that “experts should select a risk threshold proportional to potential harm of re-identification.” Experts should take into account if the data is of a more sensitive nature. Requiring data recipients to sign a data use agreements may be a potential safeguard against leaks of sensitive data if data recipients face legal liability for disclosure.

Laura Odwazny, a senior attorney for the Office of the General Counsel at DHHS (speaking on behalf of herself and not DHHS), pointed out that one “big data” concern for IRBs is privacy. She noted that most “big data” research is secondary research where the data is being used for a purpose other than for what it was collected. The Common Rule does not apply to secondary research if the private information is no longer individually identifiable. However, the increasing ability to triangulate different data sets and possibly re-identify a subject is of concern to IRBs. She noted that anecdotal evidence suggests IRBs struggle in assessing the risk of big data health research.

Odwazny also pointed out that the Common Rule’s minimal risk standard may allow IRBs to determine that big data health research presents “no more than minimal risk” and thus, may give IRBs the flexibility to determine that the informed consent requirement may be waived. Odwazny came to the conclusion that “an IRB may reasonably determine that big data health research presents no more than minimal risk to subjects under several conceptions of the daily life risks minimal risk standard.” However, she also made the recommendation that federal guidance in the area may provide additional clarity. Odwazny pointed out that the Secretary’s Advisory Committee on Human Research Protections (SACHRP) is also calling for more guidance for IRBs in their review of protocol that has big data.

To assist IRBs in their deliberations, SACHRP recommended that the Office for Human Research Protections clarify how the Common Rule’s minimal risk standard applies to big data research. SACHRP also recommended that any federal guidance should suggest that an IRB may conclude big data research presents only a minimal risk to privacy if “data privacy and security regulations” are followed.

In all, the day made clear that while research with big data is being done for important public health purposes, experts are still struggling with what level of risk subjects should be asked to absorb. To address these concerns the government may need to address how current federal regulations apply to big data research. IRBs reviewing protocols involving big data would certainly welcome the clarity.