Monday, January 1, 2018

23andMe Chip Versions Comparison (ancestry only)

Illumina Global Screening Array Chip
Back on August 8, 2017, DTC personal genome service 23andMe quietly announced that it was upgrading its genotyping chip for a fifth time to the Illumina Infinium Global Screening Array-24 v1.0 Bead Chip (GSA) — aka 23andMe version 5 or v5 — and promised new customers more improved ethnicity reports, especially those with non-European ancestry, and those with African ancestry would be the first to receive more specific African ancestry updates. I'm excited already.

23andMe's latest chip upgrade comes on heels of the US Food & Drug Administration relaxing its restrictions on health testing for DTC personal genome companies. However it's unclear if customers on 23andMe's prior chips versions will be upgraded — and it may cost you. 

23andMe also has been notoriously slow with past major upgrades — the transition to 23andMe's revamped site took more than 2 years — so I jumped at the chance to test a third time (actually 4th) to be on the new promising v5 chip. 

Since I've 23andMe results from the two prior chip versions (v3 and v4), I can compare all three to determine if v5 lives up to the hype. You can read about all the bells and whistles of the GSA chip at genetic genealogist Debbie Cruwys Kennett's excellent blog here. My comparative analysis will focus mostly on ancestry features for 23andMe's last three chip versions (v3, v4, v5).  

At close of this deep dive I will reveal my new 23andMe v5 Ancestry Composition results and tell you whether it's worth testing NOW to be on the v5 chip. And if you're a current customer on an older chip version, I'll tell you if you should take a chance waiting on a future fee-based upgrade.

King Genome's Wisdom: 
  • There are about 15 million known SNPs (aka Ancestry Informative Markers) for genetic ancestry but only 1-to-5 million are utilized by advanced genetic studies, and much less (~700,000) by vendors like DTC DNA testing companies (ie 23andMe). 
  • According to ISOGG Wiki Chip Versions, here are the different microarray chip versions utilized by 23andMe for genotyping since the debut of its DTC personal genome service:
    • v1: November 2007 
    • v2: September 2008, ~555K SNPs (Illumina)
    • v3: November 2010, >900K SNPs (Illumina OmniExpress)
    • v4: November 2013, ~570K SNPs (Illumina OmniExpress)
    • v5: August 2017, ~640K SNPs (Illumina Global Screening Array)
THE STANDARD
With ethnicity admixture estimates, my advice has always been to test at a number of good-repute DNA companies (ie 23andMe, AncestryDNA, FamilyTreeDNA) and 3rd-party tools (ie Gedmatch) offering reliable ethnicity admixture estimates, and then look for consistency among the results in oder to establish a range score for each ethnicity admixture component. Then you must use your range of ethnicity admixtures synergistically with traditional genealogy methods to unravel your genetic ancestry and pedigree.

However for comparison purposes in this analysis, I will use a "Standard" ethnicity admixture estimate to compare my three 23andMe chip version results for consistency and quality control purposes. As such I present my Dr. Doug McDonald Biogeographical Ancestry (BGA) analysis report (below;  23andMe v3 raw data file) as the Standard. On a continental and sub-continental level, Dr McDonald was excellent in his interpretation, especially since he was able to do controlled runs and identify more trace admixtures (ie Native American). NOTE: Dr. McDonald no longer does these sorts of BGA analyses so I can't provide any contact information for him.

Here is my Doug McDonald BGA chromosome painting (ie my ethnicity admixture components are "painted" on my 23 chromosome pairs), with his interpretation to follow: 
  • Please pay attention to chromosome 10 where there is two Native American (green) and European (red) segments adjacent to each other, and chromosome 20 where there is a Native American segment (green) on the top bar, far right.

Here is Dr. Doug McDonald's BGA interpretation of my 23andMe v3 raw data file:
Most likely fit is 23.4% (+- 11.7%) Africa (various subcontinents) and 58.6% (+- 12.2%) Africa (all West African) which is 82.0% total Africa and 18.0% (+- 0.7%) Europe (various subcontinents)

The following are possible population sets and their fractions, most likely at the top

Bantu Ke= 0.370 Mandenka= 0.444 Irish= 0.186 or
Maasai= 0.130 Yoruba= 0.685 Irish= 0.185 or
Maasai= 0.159 Yoruba= 0.662 Russian= 0.179 or
O-Ethiop= 0.110 Yoruba= 0.718 Irish= 0.172 or
Maasai= 0.155 Yoruba= 0.666 Finland= 0.179 or
Bantu Ke= 0.353 Mandenka= 0.460 English= 0.186 or
Bantu Ke= 0.408 Mandenka= 0.409 Finland= 0.183 or
Bantu Ke= 0.371 Mandenka= 0.442 Hungary= 0.187 
or  
allowing more populations for a better fit 
Irish 0.1589 Jewish 0.0340 Bantu Kenya 0.3324 Mandenka 0.1698 Yoruba 0.3049 or
English 0.1618 Jewish 0.0303 Bantu Kenya 0.3387 Mandenka 0.1706 Yoruba 0.2986


but the eastern European is wrong ... it is plain British. The African is indeed a bit “east of Nigerian typical”.
And yes, there really is American at 1.0%, which is, as you see on one plot, rather hard to tell the exact nature of, but is typical of US Afro (Euro)Americans. There is also a separate, and clearly real, East Asian of some sort, also at 1%. These two subtract from the European percent.
 
My alternative AncestryDNA Doug McDonald BGA analysis says:
but in fact England or Ireland are also as likely as the Eastern Europe. But what Ancestry missed is Native American and/or Asian about 1% to 1.3% each. The Mideast is POSSIBLY Sephardic.
Dr. McDonald's interpretation is telling me that my total continental African admixture is 82%, with a portion of it, 23.4%(+-11.7%), seeming to be of a nature east of Nigeria proper in West Africa. It is probably a sign of my Bantu Southeast African ancestry; the Bantu migration started near the Nigerian/Cameroon border and eventually spread to (South)East Africa and Madagascar, the latter of which I've proven ancestral links. That's why there is Bantu Kenya, Maasai and O-Ethiopia in my population fits (combinations of reference populations which best fit my ethnicity admixture profile) in Dr. McDonald's interpretation. I also know from DNA tests that I've multiple roots in West Africa with confirmed African DNA matches to tribes living in modern-day Nigeria (Igbo), Ghana (Ewe), Guinea (Fulani), Cameroon (Duala) and Madagascar (Merina).

Dr. McDonald's BGA analysis has a hard time defining my European admixture at 18%, but according to the population fits above, it is skewing British and points south-by-east on mainland Europe. This is likely from my maternal grandfather's German father, although I've additional European (British, Dutch, Scandinavian) ancestry from several other family lines too. In the population fits McDonald also gave me Finnish, and this could be a sign of my Germanic ancestry, with one of my ancestral lines tracing back to Öland island, Sweden.

The Mideast on chromosome 9 that Dr. McDonald says could be Sephardic seems to be of some sort of Jewish affinity, and the added Eastern European affinities (Hungary, Russia) probably means its most likely Ashkenazi Jewish.

Dr. McDonald also confirmed that I've at least 1% to 1.3% each of real Native American and separate Asia East ancestry,  the latter of which is related to my Malagasy (Merina) ancestry. My maternal grandfather has a Native American-specific maternal haplogroup (B2), and his matrilineal grandmother (my 2nd-great-grandmother) was thought to be of significant Native American ancestry.

Now that I've established the Standard, I'll compare it to my three 23andMe ancestry results:

V3 CHIP
23andMe introduced v3 (Illumina Omni Express) in November 2010; the chip contained ~967,000 SNPs. The first time I tested at 23andMe was actually in 2012, and I paid $299 for this test. At the time and unlike competitive DNA companies 23andMe offered autosomal DNA testing, and maternal/paternal haplogroup predictions, all for one price. Also they offered a comprehensive health report almost comparable to health utility site Promethease.

By late 2012, 23andMe upgraded its algorithm/methodology for unprecedented ethnicity admixture granularity (going from 4 continental regions to most of the 31 sub-continental ones you see today). I figured it was well worth the price and have no regrets  ... except for not waiting just a few more months when 23andMe "permanently" lowered the price to $99.

Here is my very first 23andMe v3 ancestry report before it updated in 2012: 

Here's my updated 2012 results of 23andMe v3 chromosome painting (phased):
TL Dixon 23andMe Ancestry Composition, v3 
As you can see my updated 23andMe v3 results show a total 79.5% Sub Saharan African admixture. This is roughly consistent with the ~82% African that Dr, McDonald assigned. Since 23andMe has no Southeast African Bantu category, mines would be assigned to my West African total of 78.7%. Although 23andMe has an East African category it is more Northeast African in nature. The Central African at 0.7% represents both older components in other African populations (from all across the continent below the Sahara) and actual ancestry from so-called Pygmy populations (ie Mbuti); my paternal grandmother has a maternal haplogroup found 100% among indigenous populations in Cameroon and Gabon; it is separate from Southeast African Bantu. Further my Central & South African segments are small and appear all over my chromosome painting. Here only 0.1% is assigned in the Broadly Sub Saharan African, meaning it;s found throughout Sub Saharan African populations. I've have 0.4% North African, which 23andMe  oddly separates from the African continent.

My total European is 18%, which is the average amount found in African-Americans according to 23andMe genetic study. This is equivalent to McDonald's prediction of 18%(+-) European, which is 2% less when the Native American and Asian is subtracted (16% European total), noting McDonald put all three in the same broad category. 23andMe  has a hard time properly assigning European sub-regional categories because these populations have been in contact and mixing with each other for thousands of years. Although Im confident of my European total being ~18%, Im unsure of the specifics. Accordingly, 23andMe v3 assigned me 15.4% Northwestern Europe, with 7.3% of that being British & Irish, 1.6% being French & German, 0.9% Scandinavian and not surprisingly 5.5% Broadly Northwestern European. I also have 1% Southern European but it is not further defined even with a parent testing. I'm not sure the nature of it but maybe some sort of Portuguese (via Madagascar). There is also 0.6% Ashkenazi Jewish admixture although McDonald said it was possibly Sephardic. The total Broadly European is 1%. Based on these  23andMe v3 results my German great-grandfather looks more British.

Finally I was assigned 0.9% Southeast Asian and 0.8% Native American, which are lower but roughly equivalent to what Dr. McDonald assigned at 1% to 1.3% each. There is less than 0.1% Broadly Native American & East Asian, and overall 0.4% Unassigned. With my v3 results, my Chromosome 10's Native American+European segments are exactly the same as on my McDonald chromosome painting, but on chromosome 20 the Native American segment was not assigned at all.
  • With my other 23andMe v3 ancestry reports: my mtDNA and Y-DNA haplogroups correct in terms of basic terminal branch, L1b1a and E-U290 respectively. This test also found 69 Neanderthal variants. I also have 1719 DNA Relatives but some of them are not my actual genetic relatives; at the time 23andMe allowed customers to automatically see the ancestry reports of any other customer they were sharing with. 
Thus my 23andMe v3 results are OK when comparing to Dr. McDonald's BGA analysis but will v4 be any better?

V4 CHIP
I ended up testing again at 23andMe v4 chip because I purchased the test for cheap ($49) and when a relative refused to test I sent in my sample. Also 23andMe offered no upgrades from v3 at the time and truth be told not many v2 and v3 customers wanted to; new customers had no choice in the matter.  23andMe's introduced the v4 chip in December 2013 but greatly reduced the number of SNPs compared to v3 -- the v4 chip has ~570K total SNPs vs v3 chip's >900K SNPs. The chip reduction was due to the FDA’s shutdown of their medical testing and a very tumultuous time for the company. [See 23andMe's blog here and DNA expert Roberta Estes' blog here].

Notably the v4 chip was completely custom designed by 23andMe instead of using the standard Illumina OmniExpress microarray. As such, new SNPs were added, and old health-related SNPs were removed. The v4 chip was also customized to allow better imputation. But how will my 23andMe  v4  results compare to my v3 and Doug McDonald's BGA? Take a look:
TL Dixon 23andMe Ancestry Composition, v4 
As you can see from above my v4 report is very similar to my v3 results. I still have 79.5% Sub Saharan African ancestry. My West African total is 78.6% and Central & Central African 0.8% with the same segment distribution patterns discussed in v3 section. Again only 0.1% Broadly Sub Saharan African. However my North African affinity is higher at 0.7% and the main difference appears on chromosome 12 inherited from my mother. A portion of this segment is Southern European, North African and British. On v3 it was partially North African with some of the area unassigned. Since the North African disappears at a higher confidence threshold. Im not sure what sort of admixture this is. However this sort of consistency with my total continental African admixture contribution at ~80% gives me affirmation it is probably accurate. 

It is no surprise then that the European percentage is slightly smaller at 17.8%, down from the 18% assigned by v3 and Dr. McDonald's BGA program. Here the Northwest European is 14.5%; British & Irish is 8.2%; French & German is 1.4%; Scandinavian is 1.3% and Broadly Northwest European is 3.6%. The Southern Euro is 0.1% less at 0.9% but not more specifically assigned. The Ashkenazi is the same at 0.6%. Broadly European 1.8%.

The East Asian & Native American total at 1.8%. Here the Southeast Asian and Native American is 0.9% with 0.1% being Broadly East Asian & Native American. Note the sub-totals don't add up because percentages at less that 0.1% are rounded up or down. It's also odd that 23andMe combines the East Asian and Native American category; they should be separate. With my  23andMe v4 chromosome painting, my Chromosome 10's Native American + European segment is the exact same as with my v3 and Doug McDonald chromosome painting. As well my chromosome 20, the segment that is Native American with Doug McDonald is still Unassigned with v3and v4.
  • With my other 23andMe v4 ancestry reports: I have 69 Neanderthal variants, a slight reduction and the haplogroups predictions are the same as with v3. My 23andMe v4 results list 1182 DNA Relatives, and none of whom are my friends sharing with me. There are some slight variations in segment-matching with my close genetic relatives and some low-ranging ones disappear from this test. 
My 23andMe v4 ancestry results are very similar to my v3 and Doug McDonald BGA analysis. However the new v4 chips offers no improvement and may in fact represent a step back for the company. The new health offerings don't fare any better, but I expect all of this to slowly ameliorate as the US FDA further relaxes its rules. Now for the results you've been waiting to see:

V5 CHIP
In August 8, 2017,  23andMe softly revealed that all new customers would be tested on Illumina's new GSA chip and promised new customers more improved ethnicity and health reports especially for those with mostly non-European ancestry. This is 23andMe's fifth chip version update so the GSA is known as v5 in terms of chip hierarchyThis revolutionary GSA chip features ~640,000 markers and has the ability to include up to 50,000 custom markers; these custom markers allow 23andMe to impute data (inferring missing markers in a DNA sequence) to supplement data not included with its current reference samples. 

Around the same time as the v5 release 23andMe had a very rare sale of 2 or more kits for $49 each. I purchased several so taking the 23andMe test again to be on the v5 chip for me was affordable and a natural selection (no pun intended). Here's my new 23andMe v5 Ancestry Composition:
TL Dixon Ancestry Composition (phased), v5
On my new 23andMe v5 Ancestry Composition, my total Sub Saharan African is 79.1% down 0.4 percent from my v3 and v4 Sub Saharan African results. The West African portion is now 78.2%, the Central & South African is 0.7% (same as on v3) and Broadly Sub SaharanAfrican is actually slightly higher than the two prior chip versions at 0.2%. The North African (not shown above) has been reduced to 0.1% and is no longer located on chromosome 12 nor shows any contribution from my mother, who was previously predicted to contribute at least 0.5% mostly on chromosome 12. All three of my African results are fairly identical to my Doug McDonald report.

The total European at 18.2% on  23andMe v5 is actually an increase to the amount of European that Doug McDonald assigned (16%+) and slightly higher than v3 and v4 (18%). The Northwest European is 14.3%, with 6.9% being British & Irish, and an increase of French & German to 2%. Notably the Scandinavian  is gone and is now probably hidden somewhere in my 5.4% Broadly Northwest European. The Southern European total is 1.9% but this time more specific subregions were assigned, including 0.8% Italian,  0.5% Iberian. The Broadly Southern European is 0.6%, and I wonder how much of it is Iberian and Italian?

My 23andMe v5 painting of chromosome 12's European + North African segment is now all European just like on my McDonald chromosome painting. Together the Italian+Iberian+North African does look a little Sephardic like Dr McDonald mentioned in his BGA analysis, but could also be due to Portuguese ancestors in Madagascar or lower West Africa (think São Tomé and Príncipe during the Inquisition). Interestingly the Ashkenazi Jewish admixture is 0.6% which identical to my 23andMe v3 and v4 tests; there is now small area to the right of the Ashkenazi Jewish segment  on chromosome 8 that is Unassigned and previously showed show as Asian and African. The v5's Broadly European is 1.3%, lower than v4's 1.8% but higher than v3's 1%.

The biggest improvement and SURPRISE on my 23andMe v5 results is the East Asian & Native American total rising to 2%  with 1% being Native American and 1% labelled as Southeast Asia. This perfectly agrees with my Dr. Doug McDonald Native American and East Asian results! The percentage rise for my Native American has to do with with chromosome 10 and not on chromosome 20, the latter of which is actually now showing as a Southeast Asian  segment (remember it was unassigned on v4 but Native with Doug McDonald). But let's take a closer look at my chromosome 10 (top bar from my mother): 
  • First here is my v4 chromosome 10 which is the same as v3 and Doug McDonald:
  • Second here is my new v5 chromosome 10, which the segment is now entirely Native American (yellow and orange-red): 
  • Third here is my v5 chromosome 10 with just the Native American segment highlighted: 


It is amazing to see the Native American+European segment on my chromosome 10 to now be all Native American. In the past Dr. McDonald told me that on chromosomes 10 and 20 the Native American segment also had markers found in European populations. Because 23andMe's previous chip versions focused on European populations, the algorithm probably had more confidence in assigning European over Native American at these locations. Please note however at 90% confidence this new all Native American segment on 23andMe v5 disappears in part where the portion of the segment was formerly European on v3 and v4.

Further we can examine 23andMe's Precision rate ("when the system predicts that a piece of DNA comes from population A, how often is the DNA actually from population A?") and Recall rate ("of the pieces of DNA that actually are from population A, how often does the system correctly predict that they are from population A?") to understand the issue:
  • 23andMe says the French & German category Recall rate is a measly 8% (eight percent). Accordingly this means that 92 percent of real French & German admixture is not assigned to the namesake population category. By contrast the rates (Precision/Recall) for Sub Saharan African (99/99), Native American (99/86) and Ashkenazi Jewish (97/93) are very good. Read more about 23andMe's Precision & Recall here.
My cousin and fellow genetic genealogist Rosario Naramore often complained about another potential issue with 23andMe's admixture reports — Native American being conflated with both European and Asian. Based on my comparative analysis here, Im beginning to realize Mr. Naramore may have a point, which could be a bit concerning for those lusting for accuracy. For example on  my 23andMe v5 chromosome painting the small segment on chromosome 20 that was Native American with Doug McDonald was Unassigned on v3 and v4, and is now Southeast Asian on v5

Let's take closer look at my 23andMe v5 results at 90% confidence mode:
Notice how the Ashkenazi Jewish, Southeast Asian, and Native American persists. What's notable here is that the Oceania (Melanesian, Papuan, Pacific Islander) also survives at 90% confidence and at  low amount <0.1%. The Oceania and South Asian (shown on higher confidence at <0.1%) comes from my father and is likely connected to my Malagasy ancestry, particularly the Southeast Asian (Borneo) component. Undoubtedly these small, trace admixtures are real components.
  • As a bonus, here is my 23andMe Parental Inheritance report comparing me and my father who is on v4. When a parent and child tests at 23andMe the child's results are "phased" (meaning parent and child DNA results are compared to determine what ethnic component each parent contributed to the child's genome). The child (me) then receives an additional report like this one below: 

As you can see from my Parental Inheritance report, I get most of my African from my father and the majority of the European from my mother. With my Native American admixture I get 0.7%(+) from my mother, which is about right and most likely comes from my maternal grandfather who has a Native American haplogroup. If this 0.7%(+) was one event then this would mean my 5th- or 6th-great-grandmother on my maternal grandfather's direct matrilineal line probably had significant amounts of Native American ancestry and would have born mid-to-late 1700's. Notably I've only been able to trace this line to my 4th-great-grandmother Jane Wyckoff, born about 1800 in Central New Jersey; she was described as mulatto or black — and free — on pre-Civil War census records.

This Parental Inheritance report also reveals that I've a separate NativeAmerican event from my father at 0.2%(+) and looks to be from his paternal side, with an ancestor likely born early 1700's. Where my father's paternal family lived in present-day Hinds County, Mississippi, many Choctaw tribes stayed behind rather than going to Oklahoma during Indian Removals and some were sharecroppers living next to my father's family! I've also two Southeast Asian (Malagasy) events but those are harder to time because Malagasy people have been mixed with Asian and African for at least 3000 years.
  • My other 23andMe v5 ancestry reports are the same as the prior chip versions in terms of haplogroups. I've 71 Neanderthal variants, a slight increase from v3 (69) and v4 (68). My  23andMe v5 DNA Relatives list is 1159, which is coincidentally 23 fewer relatives than with my v4 results! 
CONCLUSION

Although all three of my 23andMe chip versions' ancestry reports are similar to each other, my new 23andMe v5 test is clearly the most consistent with my Doug McDonald BGA analysis (the Standard) and is more specific than my v3 and v4 Ancestry Composition reports. My smaller ancestry components like Native American, Southeast Asian, Ashkenazi Jewish and Oceania holds good at 90% confidence so are likely real affinities. This bodes well for my family genealogical research. My 23andMe v5 results did struggle with my European admixture, but if we consider 23andMe   v5 and Illumina's promises of ancestry improvements with this GSA chip I still feel like I hit the jackpot. In this regard 23andMe v5 retains its title as the world's best ethnicity admixture test with the most cutting-edge chromosome painting tool on the market.

Yet one has to wonder if the Illumina GSA chip's ability to allow for imputation will cause discrepancies in terms of 23andMe's DNA relative matching, which is concerning because it's important for my genetic connections to be as accurate as possible. Of the 23 DNA relatives missing from my 23andMe v5 report some of them include known, but distant, genetic matches. I expect such problems to exist and like other v5 elements will likely improve. Illumina is discontinuing its OmniExpress chip and DTC DNA companies using microarray chips will probably switch to GSA so I expect current compatibility and conflation issues to be resolved in the near future as well. 

So YES, you should test NOW to be on the new 23andMe v5 GSA chip, whether for ancestry or ancestry+health (which you can upgrade to later). If you're a current 23andMe customer on an older chip version (v3 or v4), don't wait for any future fee-based update options simply because we don't know how long it will take for such an upgrade option to be offered ...  and one was never given for previous chip versions! 

I look forward to seeing your new 23andMe v5 results and what more they reveal about your genetic ancestry. 

#END#



2 comments:

  1. Thanks for this keep teaching us, do u think u will have time to check my son's remember i gave u my PW long time ago to look into it since he had 12% jewish heritage :)

    ReplyDelete
  2. This blog sported another excellent review as with your others. I concur. There were improvements in my AC too, particularly EA & NA up from 1.0 to 1.1 and holding fast at 90% confidence level. Thanks for sharing!

    ReplyDelete