Using whole genome sequencing to identify non-coding elements associated with common, complex phenotypes

Summary

We’re using a previously unparalleled amount of data on whole-genome sequences to understand how certain parts of the DNA affect common complex traits and characteristics in people.

What are we doing?

The past 20 years of genome-wide association studies have told us that the non-coding genome is highly-relevant to disease progression and regulation of complex traits and characteristics in people. What we don’t know, in the majority of cases, is how, where (which cells) or when. We aim to release an open-source pipeline for performing genome-wide analyses of the non-coding genome. Based on an analysis of people with type-two diabetes, we hope to develop methods for interpretation and prioritisation of regulatory region identification.

How are we doing it?

Our analysis, using computers, will use data from UK, TOPMed and All Of Us biobanks. Together these contain nearly 1 million whole-genome sequences of people from diverse genetic backgrounds. Our analysis will be performed on a remote cloud-analysis platform, designed to maximise participant data security.

What happens next?

Our next step is to identify regions of the genome associated with type 2 diabetes, leveraging existing functional annotations of the non-coding genome, via collaboration with Prof Jorge Ferrer at the Centre for Genomic Regulation.

Collaborators

Prof Michael Weedon

Prof Timothy Frayling

Links and downloads

Read related publications

Whole genome association testing in 333,100 individuals across three biobanks identifies rare non-coding single variant and genomic aggregate associations with height