Article Review: Does Deidentification of Data from Wearable Devices Give us a False Sense of Security?

Editor’s NoteWe rely on experts in the PRIM&R community to provide insight and commentary on substantial issues and changes within the research ethics and oversight field. If you would like volunteer to be on our source list for future articles, please fill out this form. 

Wearable digital devices, which measure a broad range of human activities including steps, heart rate, breathing, pulse, and brain activity, are part of a rapidly growing market of tools that collect massive amounts of personal health data. The results of a systematic review recently published in the April 2023 edition of the Lancet, Does deidentification of data from wearable devices give us a false sense of security?,” indicate that merely deidentifying or anonymizing digital data from wearable devices and sensors is inadequate in protecting the privacy of individuals whose data are included in the datasets. One of the major findings of the review was that data from recordings of extremely short durations, between one to 300 seconds, was sufficient to enable reidentification, according to the article.

“This discovery is concerning since publicly available data is becoming increasingly abundant, especially given data sharing advocacy and policy by influential bodies, such as the FDA and NIH,” the article says.

The NIH have adopted policies encouraging extensive data-sharing practices, including most recently the NIH-wide Data Management and Sharing Policy, which went into effect in January of this year. But, according to the Lancet authors, “Although data sharing provides tremendous benefits, it also poses many crucial questions around privacy risks to patients and study participants that remain unanswered.”

“For example, could machine-learning algorithms be applied to public datasets or data shared through third-party data-sharing agreements to enable reidentification? Is there an opportunity for data misuse by governments, corporations, or individuals? If so, how significant is this risk, and is there a way to mitigate it?”

The authors go on to say, “Advances in machine learning have made it possible to infer sensitive information about individuals, such as their medical diagnoses, mental health, personality traits, and emotions, thus making it possible to learn information that an individual has not directly shared.”

PRIM&R reached out to experts throughout the country to get their take on the Lancet article.

“Once again, we are reminded that de-identification isn’t a magic wand that makes a study ethically responsible. The participants who provide data must retain some sense of control over the process. They need to see the research as legitimate and beneficial,” said Jon Herington, PhD, Assistant Professor of Philosophy and Health Humanities and Bioethics, at the University of Rochester, when asked for his thoughts on the issues raised in the article.

The wearable device industry is expected to more than double in the next three years, raising substantial questions about how this data should be collected, stored, and used.

The Road Ahead

Megan Doerr, LGC, MS, is the Director of Applied Ethical, Legal, and Social Research at Sage Bionetworks, a Seattle-based non-profit organization that develops, builds, and shares tools to conduct dynamic, large-scale, collaborative biomedical research. Doerr sat down with PRIM&R to discuss the article earlier this month. She is not surprised at the findings and says we as a research community, have ritualized a “theater of anonymity,” around participant data, the futility of which this study clearly communicates.

“Let’s stop pretending about anonymity,” she said.

Instead, Doerr recommends focusing on how to solve health problems with data and addressing valid concerns about data usage.

“We are in an arms race. There will always be a ‘new’ data type or computational approach that dismantles our notions of privacy. We develop privacy preserving data governance approaches in response, for example, to synthetic data or ‘walled gardens’”, which is a provider that restricts access to its user data.

“But these approaches come at a price: each limits the science that can be done in some important way,” Doerr said. “Which begs the question: under what circumstances and for what reasons might individuals (and communities) want to set aside the standard of re-identifiably? We, as a regulatory and ethics community, have a lot of work to do here.”

Doerr said, we must engage the public and decide what is the range of acceptable uses of health data. Then, we can develop policies to implement that vision. In some cases, she said, some communities will be willing to sacrifice privacy for the sake of advancing health. For other communities or uses, it should be clear that data is off limits. For example, Doerr said, data should not be used to interfere with employment law, access to insurance, or access to care.

“We are the right people to have these conversations,” she said of the PRIM&R community. “We as a research ethics community, should be engaging with these questions. We should be leading these conversations, rather than hiding from them. I don’t think we have really engaged with the community for a long time. And we need to. There is no other choice.”

As artificial intelligence (AI) is deployed broadly, this becomes an urgent need. Doerr believes a decision needs to be made about “what should we point AI toward?” She said we need to ask, “What does benefit to individuals and communities really look like?”

“We want to advance our understanding of human health. This is the way,” she said. “We must focus on rebalancing acceptable data use for benefit, rather than just for privacy.” 

This Risk Is Real

“This study is yet further confirmation that this privacy risk is real, and of particular concern in the context of wearable technologies that record movements, bodily functions and highly sensitive health information,” Dr. Herington said in his comments to PRIM&R.

“We’ve long known that purportedly de-identified data can be used to re-identify most people in the US,” Dr. Herington said, adding that a 2019 study found more than 99% of people can be uniquely identified using just 15 pieces of information that are often left in “anonymized” datasets.

Jonathan Beever, PhD, Associate Professor of Ethics and Digital Culture, and Director and Co-Founder of the Center for Ethics at the University of Central Florida, shared with PRIM&R, “De-identification no longer, if it ever really did, protects research subjects anonymity. But there are tensions emerging here: how many research subjects, living in this big and fast data reality, really want their data identities protected?”

“The last Common Rule revision process, which concluded in 2018, proposed including de-identified biospecimens under the rule, but that proposal was rejected due to these same tensions. Given the big data landscape across disciplines, I do not find the Lancet results surprising at all; in fact, I see no clear way to resolve the tension between our traditional sense of privacy and our emerging big data markets,” Dr. Beever said.

“Our responses will be a combination of reimagining privacy in this new big and fast data world, and regulating research and commercial practices to recognize the most significant harms that might arise from re-identification,” Dr. Beever said.

For more information on this subject, become a member to access the following webinar and conference recordings:

 

This article was originally published in the May 2023 PRIM&R Member Newsletter. Click here to become a PRIM&R member and join our supportive membership community that provides resources and connections with colleagues from more than 1,000 institutions in more than 40 countries.