Is your data sharing consent language transparent and machine readable? 

Sharing participant research data is critical to supporting reproducibility and collaboration in the scientific community, and leads to more reliable, effective results for research and eventual clinical application. But honesty and transparency are also critical when communicating to participants about these activities. New technologies allow researchers to ensure data are shared in a manner that respects participant expectations, even as their data is made accessible around the world. The Global Alliance for Genomics and Health (GA4GH) is looking to partner with IRBs—for which ethical participant communications is an important concern—to encourage the adoption of these technologies.

GA4GH has released the Data Use Ontology (DUO), which allows researchers and other data stewards to semantically tag datasets in accordance with how they may (and may not) be shared and reused. The goal of this resource is to allow genomics researchers to communicate responsibly using a standard terminology when describing data use conditions. This ensures researchers can search for data that they are allowed to use for research studies without being waylaid applying for access to  data not available to them. 

Concretely, researchers can query any database that has implemented DUO and only receive data that matches their intended use and authorization level by matching restrictions with requests. For example, an industry researcher working on cancer research could search for any dataset that is allowed “for commercial use” and “for cancer research.” It also helps data access committees decide when to provide access to data in a manner that respects commitments made to participants and patients and could even be used to automate data access decisions.

There is a global appetite among regulators and publics for more transparency on how data is being shared. DUO harmonizes the language spoken by researchers in order to maximize responsible data sharing across jurisdictions and frameworks. It also aims to ensure accountability and promote transparency of data sharing by making it accessible and readable. Once implemented, it is a very simple tool to use.

However, the potential of this useful tool is limited when consent language relating to data sharing is unclear or incomplete in the first place. Unique, ambiguous, and diverse language in consent forms needs to be interpreted—often many years after the forms have been collected—by researchers and data access committees. This leads to difficulties and delays in determining what kinds of data sharing and secondary use are allowed or not allowed. Thus, the process to request access to human data is time-consuming. The length and complexity of the process can inappropriately hinder data reuse, and ultimately research that has the potential to benefit human health. 

Misinterpretation of a participant’s or patient’s consent can also lead to severe consequences (e.g., leaking of data that was not meant to be shared, blocking of data that was meant to be shared, or reputational costs for the institution/organization responsible for the database). These gaps in understanding generate risks for patients and participants with regards to potential infringement on their privacy. Furthermore, some patients and participants agree to share their data in a research context to be able to participate in the advancement of science and we want to avoid breaking this promise to them by failing to fulfill this commitment. 

How you can help
In order to improve the transparency of consent forms, and to ensure they can be dependably communicated and respected across the community, the GA4GH is currently seeking to partner with IRBs and funders to update existing consent templates shared with researchers. Indeed, the fastest solution to implement with the most potential for near-term impact is to produce consent form templates with clauses that unambiguously map to DUO in concert with IRBs, so that IRBs can make them publicly available on their websites for use by researchers. 

The ethical benefits of adopting DUO-compatible consent language are numerous, not only for researchers, but also for patients and participants:

  1. It encourages planning for future data sharing;
  2. It supports responsible data sharing that respects commitments made to participants;
  3. It supports consistent interpretation and communication.

The GA4GH proposes to use standard data use categories and also recommends to include an appendix in consent forms that maps the consent language to the GA4GH standard categories. This will help to ensure that data sharing conditions are clearly communicated and respected across international research networks. 

Aligning consent language with DUO terms from the beginning would reduce the administrative burden for accurate, subsequent review of consent forms to classify data use and speed the availability of data for secondary use. It would also ensure respect for commitments made to participants and patients by empowering researchers globally to use consent forms unambiguously mapped to specific DUO codes.

Thus, members of DUO participate in trialling and implementing international guidance and best practices for genomic and health data access. Applying DUO gives the GA4GH confidence in its efforts to future-proof data access processes and ensure interoperability across jurisdictions. In our 21st century digitized era, the GA4GH believes such an initiative is critical.

For more information, please consult the GA4GH website

To get involved, please contact: Jonathan Lawson (

Jonathan Lawson, PMP, CSPO is the Data Use Co-Chair for Data Use and Researcher Identities Work Stream of the Global Alliance for Genomics and Health, and a Software Product Manager for the Broad Institute’s Data Sciences Platform.

Adrian Thorogood, LLM, is the Regulatory and Ethics Work Stream Manager of the Global Alliance for Genomics and Health, and is affiliated with the Centre of Genomics and Policy, McGill University, Canada.