Precision medicine requires data. In order to improve the outcomes of individuals with cancer, or to understand rare diseases, scientists and clinicians require access to large sets of health research data.
By safely and securely bringing together a national pool of genomics and health data, the Canadian Distributed Infrastructure for Genomics (CanDIG) is helping scientists across the country access consented data that was previously siloed in individual provinces or hospitals and allowing them to address the health challenges faced by Canadians. CanDIG was built through a collaboration of computer scientists, AI specialists, geneticists and bioinformaticians from multiple institutions across Canada, including UHN, McGill, SickKids, OICR and the BC Genome Sciences Centre.
“At institutions like UHN, we’re building increasingly sophisticated data lakes containing health data from many different sources. The next step is to help researchers turn that data into new knowledge by making it findable, available and usable in a uniform, curated and secure way,” said Dr. Michael Brudno, CanDIG Principal Investigator, UHN’s Chief Data Scientist, and Professor of Computer Science at the University of Toronto.
Finding and using that data is now a little easier thanks to CanDIG. The project was highlighted in today’s special issue of Cell Genomics, which discussed genomics and health data sharing efforts globally. This special issue focuses on the Global Alliance for Genomics and Health (GA4GH), an international effort setting standards for genomics and health data. As a GA4GH driver project for Canada, CanDIG helps shape and set these standards. “By participating within the GA4GH community and international projects like the EU/Africa/Canada CINECA project, CanDIG is connecting Canadian institutions to each other and the world,” said Professor Guillaume Bourque, Director of the Canadian Center for Computational Genomics (C3G) and co-PI of CanDIG who leads the effort at McGill University in Montreal.
CanDIG was specifically built to address Canada’s province-based healthcare and privacy legislation, building a federation of datasets, simplifying the challenges of sharing across provincial borders. “As health data types grow richer and volumes increase, data federation is clearly the way forward; Canada is a leader in this approach,” says Bourque.
CanDIG is also a key component of the upcoming Digital Health and Discovery Platform (DHDP), which will support the Marathon of Hope Cancer Centres Network. “Access to whole-genome data has been vital to understanding the spectrum of mutations that accrue in cancer,” said Steve Jones, Head of Bioinformatics and Co-Director at the Michael Smith Genome Sciences Centre; and a CanDIG Co-PI, who is leading the effort at the BC Genome Sciences Center in Vancouver. “CanDIG and the DHDP will help the data collected by the Marathon of Hope Cancer Centres Network be studied by as many approved researchers as possible.”
Making that data available to researchers is key to unlocking its potential for discovery. “The smartest researchers and the most powerful machine learning techniques can’t do anything with data they can’t find, access or use,” says Brudno.
UHN's Chief Data Scientist, Dr. Michael Brudno.