Sunday, October 20, 2019

Human Reference Genome Will Upgrade

Source: (google images)
On September 24, 2019,  Genomeweb reported that the National Institutes of Health (NIH) awarded $29.5 million to two research centers in the USA and one in Europe for creation of a new human genome reference sequence to "better represent human diversity."

Approximately $12.5 million over five years will be distributed to Washington University, St. Louis, MO; the University of California, Santa Cruz; and the European Bioinformatics Institute, Cambridge, UK, to form the WashU-UCSC-EBI Human Genome Reference Center. In coordination with the National Center for Biotechnology Information (NCIB), the new center will provide a multi-genome reference sequence or "pan-genome." [You can read the full details here].

However for my layperson readers, I want to take this opportunity to explain in simple terms what a human reference genome is and why it's important. 

What is a reference genome for humans?
A human reference genome is a library or database of nucleic acid sequences representing a species (here human) set of genes. A human reference genome is created or assembled using the DNA of different donor individuals and therefore do not represent the set of genes of any single person. In other words the human reference genome is stitched together from genes of several individuals. 

Where there are differences in genes [at the same reference location] in the donors—and that are not in human reference genes or regions where their would be high allelic diversity (determines a population's long-term potential for adaptability and persistence; see Templeton et al)—those additional genes are then annotated alongside the human reference genome. 

What is the human reference genome used for? 
The human reference genome ultimately is for sample comparison with single-individual human genomes; to show genomic differences and similarities, as well as to solve biological questions. Some human reference genome applications include:
  •  use by geneticists and biologists to identify gene mutations and misalignments that cause abnormalities and diseases in humans, which can lead to better treatment, medicines and cures, and to create a better genome; 
  • in palaeogenomic studies of human populations (ie mapping against the human reference genome is used to identify endogenous human sequences in ancient samples);
  •  use by personal whole genomic sequence testing companies (ie Nebula GenomicsFull Genomes CorpYSeq, and Dante Labs  to provide customers with the most accurate DNA results for deep genetic ancestry, uniparental inheritance testing and personal health profiling. This is achieved by the company generating our results, usually a BAM and FASTQ file, and  comparing it to the latest human reference genome (known as a "build").
Why do we need an updated human reference genome?
Sequencing a human reference genome is a very complicated process. Needless to say earlier versions of the human reference genome contained many gaps and problems areas that were complicated to read properly (ie incorrect reads, missing model centromere sequences, lack of alternate loci). The last human reference genome released in December 2013 sought to improve some of these issues.

More pointedly the prior versions and current human reference genome cater to European populations. At current more than 300 million letters of DNA are missing from the human genome according to The Atlantic, a discovery which came to light after a analysis of 910 people of African descent. This is a travesty because Africa, the cradle of humanity, harbors the most genetic diversity in the world. 

What is the timeline history of the human reference genome and who maintains it?
The first reference human genome was assembled by the National Center for Biotechnology Information (NCBI) in July 2003, and it was updated in 2004 and 2006.

In 2009 the Genome Reference Consortium created "an international collective of academic and research institutes with expertise in genome mapping, sequencing, and informatics, formed to improve the representation of reference genomes," which includes:
Recent human genome assemblies:
Recent human genome assemblies chart, Wikipedia.
Today we are using GRCh38 (Genome Reference Consortium human 38), although some companies and researchers still utilize GRCh37.

I've had two whole genome sequence tests—one each from Full Genomes Corp and Dante Labs—and both of my results were compared using the the latest human reference genome build GRCh38. Personally I'm hoping a more updated human reference genome(s) will help solve the riddles of human genomic diversity, bring improvements in health, and of course provide more answers contained in my own genome, including my understudied Y-chromosome.

Thus, a new human reference genome, perhaps several, is long overdue. This is why the National Institutes of Health's $29.5 million grant will be important to making it a reality. I look forward to all of the new discoveries that will benefit me, you and humanity.