Wednesday, March 2, 2016

Coming Down the Ethnicity Admixture Pike



Source: http://www.nature.com/collections/vbqgtr
Fasten your seat-belts! The year 2016 reveals two of the major direct-to-consumer DNA companies — AncestryDNA ($99 US) and FamilyTreeDNA's FamilyFinder ($99 US) — will be updating their Autosomal DNA tests' ethnicity admixture tool and reference population clusters. AncestryDNA pleasantly surprised some of us with a temporary preview to its upgraded "Ethnicity Estimate" (now in BETA stage) shortly after announcing kits will available in 29 additional countries, while competitor FamilyTreeDNA promised a new version of its FamilyFinder "myOrigins" for the first quarter of this year at its 11th International Conference of Genetic Genealogy. I cover both offerings in this blog.

Let me tell you something! We've been thirsting hard for such updates, like waiting for admixture Godot stuck in a traffic jam of displeasure because our current results never quite stack up to our expectations and beliefs. Just four years ago our admixture estimates were infantile at best; our admixture was clumped into three to five broad continental-level categories. In rearview, AncestryDNA was first to market on October 17, 2013, with its finer-scale "Ethnicity Estimate" [see story here]. Soon after on November 19, 2013, 23andMe announced an update to its "Ancestry Composition" [see blog here]. And finally on May 6, 2014 FamilyTreeDNA introduced myOrigins, a make-over of its former admixture offering [see Roberta Estes blog here]. It's worth noting on August 10, 2015 National Genographic 2.0 updated its product with an overhaul of reference populations and "regional affinities" [see article here]. Currently 23andMe is caught up in a transition quagmire after winning FDA green-light to market health testing [see Estes' DNA-Xplained] so it's not clear when an upgrade to its "Ancestry Composition" (arguably the best admixture tool in show) will be released. Newcomer TribeCode hasn't announced a timeline for future changes to its Next Generation Sequencing-based "Ethnicity Composition" (which includes 62 reference population clusters). But before we sojourn on the long road ahead, we need to make a quick pitstop so I can gas your think-tanks up with some premium food for thought:

Let's face it. We simply have a fatal attraction for our admixture breakdowns, and we pridefully horde our results to define ourselves and our ancestors ("I've got 0.5% Native American so I'm Indian" OR "I'm descended from the Igbo tribe because I got Nigerian in me" or  deadpan: I grew up German but 52% of my DNA says I'm from Scotland & Ireland so I traded in my lederhosen for a kilt; see AnestryDNA TV spot). We always seem to forget that gene-flow from our ancestors can be intermittent (as with multi-ethnic populations) or continuous (as with endogamous populations). Lucus Martin, late founder of DNA Tribes, stated in his SNP analysis reports: "...ancestral contributions to your genome ... can include multiple cultures and languages and can cross perceived boundaries between modern countries and ethnic groups." Yet, we're hellbent on claiming ALL of the ethnicity predictions these tests produce even when they differ (GEDmatch's admixture calculators and its Oracles population-fitting program, anyone?). We are in acute denial that our admixture results are bunch of affinities, estimates and statistical probabilities, all of which are speculative and unstable at best. It is downright unconscionable that our scores are at the mercy of the DNA company's proprietary creation. We don't care that DNA companies are essentially organizing reference populations into clusters and then assigning biogeographical labels to those clusters, never mind the arbitrariness created by the fact that many reference populations are mixed due to introgression from outside populations. We just don't see the speed-limiting bumps in testing only a few of the world's populations for comparing [our DNA to] nor the 700,000+ ancestry informative markers (aka SNPs) out of potential 15 million that are currently utilized for genotyping. We notice not the potholes when it comes to the extent of usefulness for ethnicity admixture estimates and the weight we put on them to ethnically define ourselves and our families. Genetic genealogy experts concur:
  • Dr. Blaine Bettinger (aka The Genetic Genealogist) writes in a DNA test product review : "I wouldn’t rely in any way on the sub-regional categories ... but that’s a symptom of the science, not TribeCode. You shouldn’t be relying on sub-regional categories at any testing company [see A Review of TribeCode]. 
  • Roberta Estes opines in her blog, "The usefulness of these tests for accurately providing ethnicity information diminishes as the percentage of that minority admixture declines. Said another way – as your percentage of a particular ethnicity decreases, so does the testing companies’ ability to find it."  [see Ethnicity Testing & Results]
  • CeCe Moore (aka Your Genetic Genealogist) predicted in her 2012 blog ,"It is important to remember that as intriguing as these admixture predictions are, none of them are 100% accurate at the granular level. We still have a long way to go before anyone can honestly claim to be able to tell a person exactly where their ancestors once lived based on their autosomal DNA alone." [see Comparing Admixture Results...]
  •  Judy Russell (aka The Legal Genealogist) says in her blog, "In other words, these percentages are estimates based on comparisons not to actual historical populations but rather to small groups of people living today, and estimates based purely on the statistical odds that those small groups tell us something meaningful about past populations." [see Admixture Not Soup]. 
The bottom line is we should be relying more on our genetic matches with confirmed genealogies for true details about our genetic ancestry (see my blog Ethnicity Chromosome Mapping for a wonderful way to do this using our ethnicity admixture results). Nevertheless our Laissez-faire attitude toward these warnings persist. We simply want to satiate our ids with more confirmation about how we self-identify or define our genetic ancestry. We know what we want, accuracy be damned. So without further ado let's review the pending admixture updates:

AncestryDNA's Ethnicity Estimate Update
In late February 2016, a few lucky AncestryDNA customers logged on to their accounts and was startled to see a new but temporary "Preview" option for an updated Ethnicity Estimate, now in BETA testing. Most of them probably missed the light on February 16, 2014 when the AncestryInsider blog reported:
  • "AncestryDNA Ethnicity Estimate will improve. Some customers are already receiving a preview. Ethnicity estimates will improve. Ethnicity calculations are based on a group of people whose ethnicity is thought to be accurately known. Their trees go back four generations or more and all lines are from a particular place. This group of people— a reference panel — is growing from 3,000 to 9,000 people. As the panel gets bigger and Ancestry’s data gets better, their ethnicity estimates will improve. They may change what they report about your ethnicity. A larger database also allows them to divide ethnicity regions into smaller localities." [read full report here].
One BETA tester was gracious enough to make her family's tests available and am pleased to give you a sneak peak on what's to come:
BETA Tester's OLD AncestryDNA Ethnicity Estimate:
BETA Tester's NEW AncestryDNA Ethnicity Estimate:
BETA Tester's Spouse OLD Ethnicity Estimate: 
BETA Tester Spouse NEW Ethnicity Estimate:

 Genetic genealogist Kelly Wheaton, also a AncestryDNA "Ethnicity Estimate" BETA tester, warns in the ISOGG community forum, "... this beta testing may be playing havoc with non-beta participants. I suspect (but don't know) that this may be an unintended consequence of the Beta. I suggest ignoring the current results if they seem off the wall." As a result I will reserve judgment of the above screenshots until the product is officially rolled out. However I will comment on some preliminary changes that will undoubtedly lead to changes in our ethnicity admixture estimates: 
  • Cameroon/Congo combined with Southeast Africa Bantu — going to be confusing to some testers of African descent. Since the Bantu peoples are thought to originate in West Africa near Cameroon and Nigeria border and then expanded east and south about 3000 years ago, we can assume Cameroon/Congo would be very similar to Bantu populations in East Africa (Mozambique, Kenya, Madagascar). [See Li et al] However this might problematic for people who want to know if their Bantu ancestors come from West Africa (Cameroon) or East Africa (Mozambique) since some admixture tests are capable of separating Southeast African Bantu from West African Bantu. Still AncestryDNA has arguably the best African breakdown of all tests with 9 clusters (6 from West Africa, one from Southeast Africa, one from South-Central Africa, and one from North Africa).
  • Great Britain combined with Europe West  while combining these two sub-continental categories makes sense because they are virtually indistinguishable in terms of genetic affinity to each other (see Leslie et al; Schiffells et al), scientists have been able to parse British Isles from Ireland from the rest of Western Europe (see Dr. Tim Wilson's post). So DNA testers from this region expecting more specific breakdowns will be sorely disappointed.
  • Native American split into North Amerindian and South Amerindian   will be good but only to the extent that the test will be better able to detect Native American DNA in customers. AncestryDNA has notoriously missed Native American admixture in many cases, even when it is detected on other admixture tests. Also it will be interesting to see the new reference samples for these categories as we still don't know what "tribes" represent the Native American reference population cluster. 
  • Europe Central cluster addition might prove instrumental for people with genetic ancestry from populations in Austria, Croatia, Czech Republic, Germany, Hungary, Poland, Slovakia, Slovenia, and Switzerland but it remains to be seen what final biogeographical area this cluster will cover. This cluster may prove valuable with customers whose ancestors are from crossroads areas such as the Istrian peninsula (as seen on PBS Finding Your Roots [view here] with guest Lidia Bastinach, who discovered she had both Italian and Croatian roots due to the Istrian exodus). Apparently there will also be an European East cluster, which is even better. 
  • Sardinian cluster addition  could be worthy because Sardinian is a biogeographical isolate and reference samples from this population tend to be more homogenous and endogamous due to such events as the founder effect, genetic drift and island isolation. Therefore Sardinian should probably never be combined with Italy/Greece or Iberian Peninsula (renamed Spain/Portugal) clusters anyway. It used to be utterly tumultuous when trying to figure out what Southern European means on a subregional level or the unexplained inflated Italian percentages we often see in our results. So this Sardinian cluster should provide much-needed relief between eastern and western Mediterranean population affinities.
FamilyTreeDNA's myOrigins Update
Family Tree DNA held their 11th International Conference of Genetic Genealogy from November 13-15, 2015 in Houston, Texas. Razib Khan, a doctoral candidate studying evolutionary genomics and creator of the Family TreeDNA's Family Finder myOrigins tool, revealed during his presentation, "Populations in Autosomal DNA," that the next version of myOrigins (2.0) is due out in mid-to-late first quarter of 2016 
[See Roberta Estes full report here].

  • The new update for myOrigins 2.0 will include 24 reorganized reference populations (new categories in CAPS) as follows: 
Ashkenazi ... BALKAN ... East Africa ... Finland ... Germany ... British ... ITALY ... Mbuti Pygmies ... North Africa ... NORTH AMERINDIAN ... NORTH INDIA ... North West Asia ... Northeast Asia ... PAPUAN ... Slavic ... SOUTH AMERINDIAN ...  SOUTH INDIA ...  South West Asian ... SARDINIAN ... Southeast Asia ... SIBERIAN ... Spain ... Scandinavian ... West Africa 
  • In comparison the myOrigins 1.0 was: 
Western & Central Europe ... East Central Africa ... Eastern Europe ... Native American ... North Africa ... Northeast Asia ... Central Asia ... Southern Europe ... West Africa ... British Isles ... Finland & Northern Siberia ... Scandinavia ... Asia Minor... Ashkenazi Diaspora ... South-Central Africa ... Southeast Asia ... South Asia ... Eastern Middle East.

This update may prove to be a much-needed refinement of FamilyTreeDNA's Family Finder myOrigins product, which has been criticized for having the lowest resolution of all ethnicity admixture tools when compared to competitors. (FamilyTreeDNA, however, remains the leader in the market for advanced mitochondrial-DNA and Y-DNA testing and the only one to provide genetic relatives for those tests.) On myOrigins 2.0 there will be a second reference population for the Native American cluster  North Amerindian and South Amerindian — and this should improve the test's ability to pick up this admixture. Siberian is now separate from the Finland (myOrigins 1.0 had them together) and Scandinavian clusters. Likewise Khan separated South Asian into North Indian and South India clusters, perhaps a nod to his own ancestry but just as likely because these populations have genetically distinction between them [see Cameron; Reich et al]. Good advancement for people with South Asian ancestry, another sub-continental region not properly vetted on most admixture tests. Also the Eastern Mediterranean sub-region is separated into Balkan, Sardinian and Italian. As I explained earlier these populations are often grouped into a broad Southern European cluster, including the Iberian peninsula, producing weird results and frequently an overestimate of Italian admixture percentages. The Papuan cluster addition will add a nice contrast to the Southeast Asia category and possibly bring more clarity for people with genetic ancestry from this sub-region. Yet it will be interesting to see how people with Oceania admixture might fare since there are some distinctions between Melanesian, Polynesian and Papuan populations. Of course the African categories need the most attention as well as additional reference population clusters (and this is true for all admixture tests). The new myOrigins 2.0 categories will be East Africa [perhaps too broad; see Dobon et al], Mbuti Pygmies, North Africa, and West Africa. These clusters are exceedingly similar to the old version with East Central Africa, North Africa, West Africa, South Central Africa. This might amount to routine academic reference sample clusters being renamed rather than an actual improvement in granularity for African-descended customers. Overall, a definite come-up for FamilyTreeDNA's myOrigins.

CONCLUSION
It remains to be seen if AncestryDNA and FamilyTreeDNA's admixture estimate updates will be able to decode us better. This ultimately will require a change in the underlying data, genotyping technology and algorithms for maximum viability. The accompanying white papers should give us clearer directions on the long road ahead. All we know is these pending updates will get us closer to our final destination of perfect ethnic identification. And I will let you know when I get there.
#END#

8 comments:

  1. TL,
    Excellent recap. I think it will be a few iterations from now and perhaps when we get to Full Genome sequencing as the norm that we will finally get to something really meaningfully better than the current rough estimates. And they really should be considered rough. In my case having my German and French ancestry more accurately reflected in ANCESTRY's Central Europe is helpful. Losing my African segments altogether is not. At FTDNA there sample size for Germany had been laughable small so any additional data sets are in order. I also hope that in the My Origins 2.0 the percentages under 1% are included.

    ReplyDelete
  2. Thanks so much Kelly. I really appreciate it. I'm looking ahead with a grain of salt on the side. LOL. I think myOrigins should release values less than 1%. I've one strong segment of Ashkenazi on chromosome #9 and it amounts to 0.6% on both of my tests at 23andme. It does not show on FTDNA or AncestryDNA.

    ReplyDelete
  3. Thanks so much Kelly. I really appreciate it. I'm looking ahead with a grain of salt on the side. LOL. I think myOrigins should release values less than 1%. I've one strong segment of Ashkenazi on chromosome #9 and it amounts to 0.6% on both of my tests at 23andme. It does not show on FTDNA or AncestryDNA.

    ReplyDelete
  4. Nice TL, this is a great recap. I am excited about the African populations being broken down differently. I hope Ancestry updates do not disappoint.

    ReplyDelete
    Replies
    1. Thanks Mike King. I appreciate it. I agree except I don't know about that Cameroon/Congo/Southeastern Bantu category, although in theory seems to be correct vis a vis all Bantu are supposed to similar and it seems they are utilizing Bantu for reference populations. But I also thought the Bantu were similar linguistically and wonder why they their is not greater resolution between Bantu-speaking sub-populations.

      Delete
  5. Tl, thanks so much for this detailed breakdown of what appears to be coming down the pike. It may be worth me doing another DNA test to compare against my current results through Ancestry DNA.

    ReplyDelete
  6. Some of the new names for FTDNA'S MyOrigins seems potentially questionable. Like for instance the new Slavic category renaming (?) of the old Eastern European category. Why call it the Slavic component when the component almost surely won't even peak in Slavic people? Also their current map for what the Eastern European component region covers is wrong. They show it barely going into the Baltics and yet some in the Baltics score like 94%-100% on this! OTOH some in Slavic countries often score only like 30-60%. I'd bet the newly named Slavic component will peak in the Baltics, which are not even Slavic and it will be the same story for the totally misnamed Northern Slavic component on DNA.LAND.

    ReplyDelete
  7. Hello! Has FTDNA thrown in the towel on this? It's somewhat substantially past the first quarter of the year at this point! :-)

    ReplyDelete