AI and the “downstream” risks of research

This is part one of a two-part post on the ethical and practical considerations of risk in AI research. Read part two here.

I was lucky to participate in a panel with Elizabeth Buchanan, CIP, PhD, Director, Office of Research Support Services and Senior Research Scientist, Marshfield Clinic Research Institute. and David Strauss, MD, Independent Consultant and Special Lecturer, Columbia University, at AAHRPP’s 2022 annual conference this past May, on the ethical implications of big data and artificial intelligence (AI) in research. While preparing for and participating in the panel, I began to think about how the conversations about ethics of AI research might force us to rethink our approach to the oversight of all research.

When we talk about AI research, we are mainly talking about research that seeks to develop tools that will replace human decision-making (though there are also other uses of AI within research, such as for patient recruitment). The development of AI typically involves the collection and use of huge amounts of data to train an algorithm to make decisions or predictions within some domain. The algorithm is then tested and validated based on how accurate or otherwise fitting are its decisions/predictions, given the purpose for which it was developed. The goal is to then apply the AI model to new data in the real world, such as allocating emergency room beds, diagnosing suicide risk based on social media posts, or managing workplace stress through remote sensing technologies, to take just three examples.

As discussed at length during recent SACHRP meetings, which has taken up the issue of AI in human subjects research over the last year, much AI research falls outside the human subjects research oversight framework. There are three main reasons for this. One is that the data involved are often collected, owned, and used by commercial entities, leaving many of these unregulated. A second is that AI research depends on the collection of huge amounts of data from sources such as social media, apps, internet browsing histories, consumer and wearable devices, and electronic health records; although these are data from and about humans, much of it is either deidentified, or is, for instance in the case of social media data, already in the public domain, and so largely exempt from IRB review. 

A third reason has to do with what Arvind Narayanan, PhD, of Princeton University has called the specific “landscape of harms” involved in AI research. While there may be risks to people whose data are included in the large data sets used to train algorithms—reidentification being the most obvious—those risks are considered rather low. (It’s worth mentioning that there are other important ethical questions about whether, how, and to what extent we ought to be informed about, asked our permission regarding, or even control, the uses of our data, including in the creation of AI algorithms—that’s a topic for another day).

Instead, the most salient and serious risk of harm in AI research is to those on whom the AI is applied in the real world. One such harm is “algorithmic bias.” Algorithmic bias arises when the data on which an AI algorithm is trained is not complete or representative, or is itself biased, leading to inequities or bias in the ways the AI is deployed in the real world. To take a well-known example, in 2019 Ziad Obermeyer, PhD, and colleagues found that a health care risk-prediction algorithm widely used by hospitals and insurance companies to predict which patients would benefit from more intensive care demonstrated racial bias—specifically, it led to Black patients receiving less quality care than non-Black patients. This was because the algorithm’s designers used patients’ health care spending as a way to identify medical need.

But it turns out spending is a poor proxy for need. Black people as a group have fewer resources and less access to the healthcare system, so of course they spend less on healthcare. But the algorithm identified them as having less need for high-risk related care, despite the fact that Black individuals have the highest cancer rates, are five times as likely to die of pregnancy-related causes, and are 60 percent more likely to be diagnosed with diabetes than white individuals. Thus, an algorithm that was supposed to streamline and make more accurate decisions about providing care where most needed served to perpetuate a system in which Black people receive fewer health care resource and worse care. There are many other such examples.

Algorithmic bias is not the only example of downstream harms of AI research; there is also concern about more intentional misuses of AI algorithms. An infamous study in 2017 claimed to develop an algorithm for recognizing sexual orientation through facial recognition, which had been trained on social media images of people who were out on social media. Scholars criticized the research, raising concerns about the possible uses of such technologies in repressive regimes where homosexuality is stigmatized, dangerous, and even illegal.

The Common Rule and the FDA regulations tell IRBs explicitly that they “should not consider possible long-range effects of applying knowledge gained in the research (e.g., the possible effects of the research on public policy) as among those research risks that fall within the purview of its responsibility” [emphasis added, 45 CFR 46 111(a)(2)]. In other words, the human subject protections regulations are clear that while the promise of general “knowledge to be gained” should be factored into thinking about the potential benefits of research, concerns of algorithmic bias or unscrupulous or careless uses of AI tools should not be factored into the risks that are to be balanced with those benefits. In next week’s Ampersand post, I will explore why I think this is a problem and why AI research presses on important questions about paying attention to downstream harms.

Elisa A. Hurley, PhD, is the Executive Director of PRIM&R.