An effort to expand on the Human Genome Project by capturing the diversity of people around the world has produced the first draft of a new resource called the “pangenome reference”.
What is a pangenome?
It is a set of genomes from many individuals put together to show where the sequences are identical or different. The draft human pangenome consists of 47 genomes, and the plan is to expand this to 350 genomes by 2024.
Why do we need it?
The pangenome will help researchers discover what effects genetic variants have, and to develop treatments for conditions linked to those variants. At present, some variants are essentially invisible to researchers because of the reliance on a single reference genome.
Hold on, what is a reference genome?
It is a kind of map. When researchers sequence someone’s DNA, they get lots of pieces that they put together based on where they fit on the reference genome. It is a bit like assembling a skeleton by looking in an anatomy textbook to see where each bone fits. For the vast majority of bones, that works fine, but some people have extra bones such as cervical ribs that aren’t in the textbook. “Currently, when we map a sequence from a patient, there’s always a fraction of the sequence, sometimes a significant fraction, that can’t be mapped,” says Evan Eichler at the University of Washington in Seattle.
Whose DNA was the reference genome based on?
The reference genome was supposed to be made from a mix of DNA from 20 anonymous donors, but in the end, 73 per cent of it came from one individual. Later analyzes have shown that that person was African American, and also that the next biggest donor, at around 6 per cent, was mainly of east Asian ancestry.
We have already sequenced millions of genomes. Why haven’t we got a pangenome already?
The many genomes we have sequenced are far from complete – in fact, the single reference genome was only 92 per cent complete when the Human Genome Project was declared “complete”. Only short pieces of DNA could be sequenced at the time and because much of the genome is highly repetitive, many of these small pieces couldn’t be reassembled. The pangenome project has used methods that produce much longer pieces, known as “reads”. As a result, the pangenome is based on extremely high-quality sequences that are 99 per cent complete.
Whose genomes are included in the pangenome?
We don’t know, says Karen Miga at the University of California, Santa Cruz. The anonymous donors were participants in a previous initiative called the 1000 Genomes Project, chosen on the basis of how well their genomes collectively reflect human diversity. Around half of the donors are African or have African ancestry, but more are needed, says Eichler. “Because Africans have so much diversity, and all humans are descendants of African populations, in Africa we have to do much deeper sampling before we have a true human pangenome reference,” he says.
How much human variation does the pangenome capture?
It includes a lot of common variants that are shared by many people because the mutations occurred in distant ancestors with lots of descendants. To fully capture all human diversity would require a pangenome containing all our 8 billion genomes, but that isn’t the point. “What we want to achieve is that every variation can be analyzed now, and no reads are unmapped,” says Tobias Marschall at Heinrich Heine University Düsseldorf in Germany. “Every piece of the genome now has a place it can go to.”
Will biologists use the pangenome reference instead of the single reference genome?
Some will. But most will be very slow to switch to using the pangenome, says Jesse Gillis at the University of Toronto in Canada, who in 2021 put together an alternative “consensus reference”. Researchers have developed lots of methods and software based on the single reference, and the pangenome is more complex, he says. Benedict Paten at the University of California, Santa Cruz, a member of the pangenome team, acknowledges that people won’t switch if the costs are higher than the benefits. But the pangenome team has developed software tools that are just as fast, he says.