Wednesday, April 19, 2017

Your AncestryDNA Range Score


Sometimes a grain of salt can turn out to be a granule of sugar. When a lot of you receive your AncestryDNA Ethnicity Estimate results, you instantly take the percentages you see displayed at face value.

Too often this leads to disappointment especially if the percentages of certain a genetic ethnicity or biogeographical region is lower than expected, absent or inconsistent with your test results from other sources. There are even situations where one parent is the sole contributor of a certain ethnicity, but somehow the child's results show a higher amount than the contributing parent!

What if I told you that you were looking at your AncestryDNA Ethnicity Estimate all wrong?

I was actually surprised by the number of AncestryDNA testers who never realized the Ethnicity Estimate percentages they see displayed are NOT set in stone, including some regions for which they received 0% (zero percent). How is this possible?

What you're really "looking at" is an AVERAGE SCORE that was calculated from a broader RANGE SCORE, the latter of which may reveal the presence of ethnicity admixture you thought was missing. In this blog I'm going to discuss AncestryDNA's Ethnicity Estimate:

(1) Average Score;
(2) Range Score;
(3) How the Range Score is Calculatedand for you ethnicity admixture geeks,
(4) Brief Comparison Between My Two AncestryDNA Kits.

For this lesson I will present my two AncestryDNA Ethnicity Estimate results: Kit 1 (2012) and Kit 2 (2014). I will focus on West African (Nigerian), Native American and European Jewish (Ashkenazi) admixture, all of which was previously detected in my genomic data by other DNA companies, third-party utilities, and independent Biogeographical Ancestry analyses.

(1) Average Score 
AncestryDNA Ethnicity Estimate's Average Score is simply the percentages you see when you view your AncestryDNA Ethnicity Estimate report, inclusive of "Low Confidence" (previously known as "Trace") regions*. However AncestryDNA warns, "Our confidence that your actual genetic ethnicity is EXACTLY the average is not high." This is because the Average Score you see is calculated from a broader Range Score (more on this in Section 2). 

*NOTE: On March 28, 2017, AncestryDNA introduced the new tool Genetic Communities and updated its interface. As such some screenshots I use herein may look different from the next one but the information is exactly the same. Here are the results from my Kit 1 and Kit 2:

My AncestryDNA Kit 1 Ethnicity Estimate showing AVERAGE SCORE percentages:
CLICK IMAGE TO ENLARGE
My AncestryDNA Kit 2 Ethnicity Estimate showing AVERAGE SCORE percentages: 
CLICK IMAGE TO ENLARGE
(2) Range Score
When AncestryDNA analyzes your genomic data, they create a Range Score for each of your ethnicity "regions" and it is subsequently used to calculate your average percentage for each of those regions. More to the point, AncestryDNA says the Ethnicity Estimate, "shows the average estimate as the given percent for each region. The general spread of the 40 estimates is shown as the probable range. Our analysis suggests that your actual ethnicity for this region lies somewhere in this range."

To broadly simplify how the Range Score works let's use YOU as an example:

Imagine taking the AncestryDNA test 40 different times, and each time receiving slightly different results (as shown with my Kit 1 and Kit 2 plus 38 more). Once all 40 of your results are in then take each of your 40 scores per ethnicity category (ie Nigerian, Ireland, etc), add them up, divide by 40 and ... joila you have an Average score (what you see displayed in the Ethnicity Estimate report). Now take the additional step of showing for each region:
  • the lowest amount you found on a test (ie 3% Scandinavian)  
  • the highest amount you found on a test (ie 10% Scandinavian)
Now you have a Range Score (ie 3% to 10%) for the Ethnicity region (ie Scandinavian) = 3% to 10% Scandinavian! AncestryDNA employes a similar multiple-analysis methodology when calculating as well as displaying your Ethnicity Estimate results, but luckily you don't have to provide 40 DNA samples. (More on this process in section 3.)
  • To find the Range Score for any of your Ethnicity regions, just click on the title of the ethnicity region and a drop-down module will open showing the Range Score  even the categories equaling 0%. 
  • You can look at the Range score for regions equaling 0% by clicking the box next to "Show All Populations" and you'll see a full list of global ethnicity clusters (26 in total).
Now let's take a look at some of my Range scores for the "Regions" I selected for this blog:

NIGERIAN** (WEST AFRICAN)
I only received 2% Nigerian on both of my kits. In fact with both of my kits the Nigerian Average score is located under the "Low Confidence" region tab so I have to click that to even see it. I expected the Nigerian estimate to be higher especially because I've at least two family branches with with deeps roots in Colonial Virginia and Maryland; both regions saw a large importation of enslaved Igbo. Also a few of my African genetic matches from DNA tests identify as Igbo. When I check the Range Score for Kit 1:
As you can see the Range score for Nigerian is 0%8% (translated as 0% to 8%). This means on some of the 40 analyses that AncestryDNA's algorithm performed, 0% Nigerian was detected and on at least one run, as much as 8% was found. If AncestryDNA had used the run with 8% Nigerian exclusively then my Ethnicity Estimate would look very different and be much more consistent with what I expected. I suspect my "Nigerian" markers are probably confined to certain areas of my genome and apparently the AncestryDNA's algorithm didn't land there often.  **Note: AncestryDNA is the only DNA company with an ethnicity cluster/reference population named "Nigerian" and this might be classified as "Sub Saharan African," "West African," "Igbo" or "Yoruba" on other admixture calculators.

NATIVE AMERICAN
Like many of you I've stories about Native American ancestry in my family. According to my other DNA tests and biogeographical analyses, I have between 0.8% to 1.3% Native American "real" admixture so I expected the same with my AncestryDNA results. Of course I was horrified to learn my Kit 1 showed NO Native American. Instead I was assigned South Asian @ <1%.

However on my Kit 2 (below) my Ethnicity Estimate showed <1% Native American Average score and NO South Asian! How ironic. Checking my Range score we see AncestryDNA detected up to 1% Native American (which would be consistent with my other results):


Now remember I told you earlier than an Ethnicity region in which you received a 0% (zero percent) Average score you can check them to see if any percentages actually show up in the Range score
  • Just click on the box next to "Show All Populations" located at the bottom of your Ethnicity Estimate report.
Now let's return to my AncestryDNA Kit 1 (below) where Native American admixture is showing as 0% from my Ethnicity Estimate. When I check the range score I get:

As you can see my Native American percentages for the Range score is the exact same for both kits!

EUROPEAN JEWISH (ASHKENAZI)
Perhaps my most surprising admixture component, my European Jewish is small amount yet I know is real from previous tests, biogeographical analyses and lots of European Jewish genetic matches.  In fact 23andMe assigned me 0.6% Ashkenazi and it appears as one single segment on chromosome-pair 9. Neither of my AncestryDNA kits showed any European Jewish (Ashkenazi) estimates, which was also surprising at first. Here is what my Range Score shows for both AncestryDNA kits:

Kit 1:

Kit 2:

As you can see my European Jewish Range Score is actually less than 1% for both kits. This also means it is consistent with the 0.6% Ashkenazi I received from 23andMe as well as the less than 2% Jewish-Ashkenazi assigned by FamilyTreeDNA's myOrigins test.

Finally it is possible for your Average Score of a particular Ethnicity region to be higher than the parent who gave it to you. Imagine for example that you and both parents tested, and your mother is supposed to be 25% Chinese (East Asia). When the results comes back you have 20% Asia East, your dad has 0% Asia East but your mother's average score is 18% Asia East. In this scenario viewing your Range Score may you have between 10% to 20% Asia East but your mother may show 20% to 30%. So in this instance your mother has at least as much Asian East as you and based on the ranges she has much more — up to 30%. In the same scenario it could have been possible for your dad's Range Score to show 0% to 5% Asia East, and therefore he could have been a contributor of Asia East too. Do you understand?

TL DIXON's TIP: Take all of your reliable/accurate DNA ethnicity admixture estimates (ie AncestryDNA, 23andMe, FamilyTreeDNA, National Genographic 2.0, MyHeritage, LivingDNA, GedMatch, DNA.Land, etc) and create your own range score for each of your ethnicity components. This can be tricky as you try to make the ethnicity categories among test results congruent because each company use different ethnic, biogeographical or regional nomenclature and reference population groupings, often to define the same categories. Further we may get the same admixture from both parents. Going forward, instead of telling people you're "10%" Scandinavian say that you have "5% to 15%" or "up to 15%" Scandinavian. 
[If you need more help understanding your ethnicity admixture results see my blog here].

(3) How the Range Score is Calculated
Of course AncestryDNA could do a better of job of informing us about the existence of the game-changing Range score and how it is calculated. However if you're naturally inquisitive like me then you will search to see if AncestryDNA offers an explanation. But of course they do, especially for laypersons. As such I will post the AncestryDNA Learn More tutorial below, which tells you exactly how the Range Score is calculated (commentary in blue and screenshots by AncestryDNA):

When we calculate your estimate for each ethnicity region, we run forty separate analyses. Each of the forty analyses gives another estimate of your ethnicity, and each one is done with randomly selected portions of your DNA. Why forty? Ethnicity estimation can be variable from comparison to comparison -; different combinations of DNA can give us different information, so doing multiple analyses can give us a more accurate estimate, as well as the likely range.

In the example below, we measure an estimate for one person for one ethnicity region. This first chart illustrates that for each of the 40 analyses, a slightly different portion of DNA is analyzed.

AncestryDNA screenshot

This gives us 40 different estimates of a person's ethnicity for each region

Each of the 40 estimates covers a substantial portion of the tested genomic information. Sometimes we find a diverse range of estimates for each region tested.

AncestryDNA screenshot

The Average Estimate

We look at each of the 40 estimates and find the average amount predicted for each region. This average becomes the percent that is displayed in the estimates. Our confidence that your actual genetic ethnicity is EXACTLY the average is not high.


AncestryDNA screenshot

The probable range

There is often a wide range among these 40 estimates. The range shown in the product experience encompasses most of the variability found in the estimates. Our confidence that your actual genetic ethnicity falls within this range is relatively high.
AncestryDNA screenshot
My observation based on a cursory view of AncestryDNA's Ethnicity white paper is it's unclear how many of your total 730,525 SNPs are examined per each of the 40 scans, which could affect how much of your supposed ethnicity region didn't get detected.

(4) Brief Comparison Between My Two AncestryDNA Kits
In this section I compare my two AncestryDNA kits but focusing only on my Average scores for the Ethnicity Estimates. It was actually posted in 2014 before I knew how to utilize the Range score and is discussed in reverse chronological order. My purpose with presenting this information is showing how not considering the Range score in your Ethnicity Estimate can lead to misleading conclusions about your results.

So I had an opportunity to take the AncestryDNA test again when a relative refused to submit another saliva sample after the submission failed; the relative is now hoping for success at 23andMe. I must admit to an innocuous curiosity about how my Ethnicity Estimate on the newer 2014 kit (K2) might vary from the first one I took back in 2012 (K1), as well as if predictions to genetic matches would change. (Please note K1 was submitted before AncestryDNA overhauled its ethnicity admixture tool).

I expected there might be some differences particularly based on AncestryDNA’s proprietary analysis of calculating an estimate for each ethnicity category by running 40 separate scans, with each scan choosing randomly selected portions of our DNA. Overall, both results were very similar and in “normal” range but not completely identical. To this extent some of the differences, such as introduction of additional ethnicity categories to K2, might led to an assumption that both kits represented two persons whom coincidentally received similar admixture profiles. Otherwise AncestryDNA has predicted K2 to be a twin or self to K1 with 99% confidence.

The regions I show affinity are almost the same on both kits, with the notable exception of Native American and Asia East now added to K2. Interestingly, Middle East (formerly Near East) is the only category for which I have 0% affinity in the “Show All Populations” view of the Ethnicity Estimate for both kits.

Many of my genetic matches were the same for both kits, however the “confidence level” of predicted kinship changed; in once case a 4th-to-6th cousin with 96% confidence on K1 has been downgraded to 5th-to-8th cousin with “Moderate” confidence on K2. Perhaps the most significant differences between kits is the more recent one has a higher range of no-calls than the earlier one : K2 @ 1.538% (10498 of 682549 SNPs) vs. K1 @ 0.206% (1403 of 682549 SNPs), but I’m not sure if this is even significant to the percentage of total SNPs actually utilized for the analysis. As of 2016, AncestryDNA tests 730,525 SNPsIn my opinion, K2 has a better estimate than K1 despite higher number of no-calls because it is roughly consistent with my other admixture results. Here's how my two Ethnicity Estimates compare:

AFRICAN  Kit 1: 79% ... Kit 2: 78%
Very similar to my 23andMe’s Ancestry Composition update, I lost 1% of total Sub Saharan African, from 79% to 78% on my ancestry.com’s K2. I’m now beginning to wonder if ethnic components from Asia, Europe and the Americas are being misidentified as Sub Saharan African. With Cameroon, I gained a point to 29% on K2. With Benin/Togo I lost a point to 19% on K2. With Senegal, which I suspect is the origins of my maternal haplogroup L1b1a, I lost a point to 13% on K2. With African Southeastern Bantu, I also lost a point to 7% on K2. However,  Ivory Coast/Ghana and the South-Central Hunter-Gatherers and was the same on both kits at 6%. Under Trace Regions (now Low Confidence Regions), Nigeria gained a point to 3% on K2.

Cameroon  Kit 128% ... Kit 2: 29%
Benin/Togo  Kit 120% ... Kit 2: 19%
Senegal --- Kit 1: 14% ... Kit 2: 13%
African Southeastern Bantu  Kit 1: 8% ... Kit 2: 7%
Ivory Coast/Ghana  Kit 1: 6% ... Kit 2: 6% 
Nigeria  Kit 1: 2% ... Kit 2: 3%
African South-Central Hunter-Gatherers  Kit 1: 1% ... Kit 21% 

NATIVE AMERICAN & ASIAN
Perhaps the 1% missing from the African on K2 went here. K2 detected Native American and Asia East @ less 1% each. I generally range 1.6% to 2.3% Native American and/or East Asian on various tests. On K1, I only had less 1% South Asian and less 1% Pacific Islander - Polynesian, the latter of which is remarkably present on both tests. K2 has no South Asian. On my 23andMe test, South Asian was replaced with Southeast Asian so this population has similarity to both regions. I suppose a few of AncestryDNA 40 scans landed in spots ripe with Native American-related markers this time around; I know these segments are located on chromosomes 5, 10, 17 and 22.

Native American  Kit 1: 0% ... Kit 2: <1%
Asia East  Kit 1: 0%/ ... Kit 2: <1%
Asia South  Kit 1: <1% ... Kit 2: 0%

EUROPEAN (total)  Kit 1: 19% ... Kit 2: 19%
The biggest percentage changes appear to be with the European subcategories, although overall European is the same at 19%, and like Africa section, all the subregions remained the same. I suspect my affinities with European will forever fluctuate because of how highly admixed these populations can be. However IMO AncestryDNA detects my Scandinavian, Dutch and German better than other tests but numbers could be inflated. With British, I lost two percentage points down to 7% on K2. Europe West is no longer a Trace Region estimate, as it has increased to 5% on K2 from 3% on K1. Scandinavian and Europe East is the same at 5% and less 1% respectively. Ireland actually increased from less 1% to 1% on K2. Neither kit picked up my tiny, but real European Jewish admixture.

British  Kit 1: 7% ... Kit 2: 9%
Europe West  Kit 1: 3% ... Kit 2: 5%
Scandinavia  Kit 1: 5% ... Kit 2: 5%
Ireland  Kit 1: <1% ... Kit 2: 1%
Europe East  Kit 1: <1% ... Kit 2: <1%

PACIFIC ISLANDER POLYNESIAN — Kit 1: 1% ...  Kit 2: <1%
The Polynesian affinity is showing due to my Malagasy ancestry, and this region is classified as Southeast Asian, Austronesian, South Asian or Indonesian on other DNA tests; Dr. Doug McDonald says my "East Asian" affinity is ranges from 1% to 1.3% just like the Native American. On AncestryDNA I combine it with my Asia East on K2 and South Asian on K1 noting AncestryDNA does not include a Southeast Asian region.

DAVID PIKE'S AUTOSOMAL DNA UTILITY:
I wanted to check my AncestryDNA kits' raw data for Runs of Homozygosity (ROH), which are chromosomal locations in which you inherit the same alleles or bases (A-A instead of A-C) from both parents at same chromosomal location. If someone has long ROH's then it could mean his or her parents are related to each other. When I run both kits raw data through David Pike's Autosomal DNA Utility to compare the raw data to search for runs of homozygosity, these were the results:

Kit 2:
Chr  X:  100.000 % (17604 of 17604 SNPs) are homozygous,   318 No-Calls,   1 heterozygous SNPs treated as homozygous
Chr  Y:    885 SNPs,    22 No-Calls,   0 heterozygous SNPs treated as homozygous
Chr XY:    440 SNPs,     9 No-Calls,   0 heterozygous SNPs treated as homozygous

Total autosomal (Chr 1-22):   1.538 % ( 10498 of 682549 SNPs) are NoCalls
Total autosomal (Chr 1-22):  30.839 % (210489 of 682549 SNPs) are Heterozygous (this tally excludes 0 heterozygous SNPs that were treated as homozygous)


Total autosomal (Chr 1-22):  69.161 % (472060 of 682549 SNPs) are Homozygous   (this tally includes 0 heterozygous SNPs that were treated as homozygous)

Kit 1:
Chr  X:  100.000 % (17604 of 17604 SNPs) are homozygous,   252 No-Calls,   7 heterozygous SNPs treated as homozygous
Chr  Y:    885 SNPs,     8 No-Calls,   0 heterozygous SNPs treated as homozygous
Chr XY:    440 SNPs,     2 No-Calls,   0 heterozygous SNPs treated as homozygous

Total autosomal (Chr 1-22):   0.206 % (  1403 of 682549 SNPs) are NoCalls
Total autosomal (Chr 1-22):  31.318 % (213764 of 682549 SNPs) are Heterozygous (this tally excludes 0 heterozygous SNPs that were treated as homozygous)
Total autosomal (Chr 1-22):  68.682 % (468785 of 682549 SNPs) are Homozygous   (this tally includes 0 heterozygous SNPs that were treated as homozygous)

As you can see ROH are pretty much the same for both AncestryDNA kits, and they are short so no indication my parents are closely related to each other. 

CONCLUSION
Both of my AncestryDNA Ethnicity Estimates are very similar to each other and consistent with my results from other tests. However I would not have known this had I not checked the Range Score for all of my regions. As I've demonstrated —and as AncestryDNA points out — the Average scores we see are very unreliable and unstable. We should be looking at the more fluid Range Score which paints a more realistic snapshot of our genetic ancestral contributions. Undoubtedly genetic ethnicity forecasting is still a young and rapidly evolving science so our results are will change with the next ethnicity estimate update. To this extent AncestryDNA is brilliant in its pragmatic approach at calculating our ethnicity estimates. You should be too. In fact I encourage all of you to create an genetic admixture composite or profile showing a Range Score for each of your ethnicity estimates culled from ALL of your reliable DNA tests and analyses. Enjoy in moderation, as too many salt grains and sugar granules is never good for you.


###End###

15 comments:

  1. Very interesting and enlightening! I must admit that this is all above my head, but I'm learning! Thank you for sharing!

    ReplyDelete
  2. This was quite helpful. Off to calculate, I go...

    ReplyDelete
  3. Thank you so much for your detailed but easy to understand explanations. I will be doing this.

    ReplyDelete
  4. TL: Once again, you've provided a wonderful breakdown of some very complicated stuff. I'll look at the ranges with a much better understanding of what I'm seeing. Thank you!

    ReplyDelete
  5. Great article - Tanx! My gosh... spent 20+ minutes in phone-con with Ancestry.com DNA customer service yesterday ("expert"??) questioning this very subject. Learned none of this... a great read... well written... where the basics behind Ancestry.com DNA Ethnicity Testing was actually discovered.

    ReplyDelete
  6. GREAT was a Blog post, TL! I went back to check all my ranges and low and behold what I thought was simply 0% was much more than I had initially assumed. Thank you for shedding light on in important point which enhances the interpretation of our results. Question: How does homozygous SNPs impact one's results? Thanks! Karim

    ReplyDelete
    Replies
    1. Homozygous SNPs are those markers which show little genetic variance. So if you're family is composed of marriages to cousins or if both sides of your family are related, you will have a high % of homozygosity. (like the English royaly family)

      Delete
  7. Wow! Thanks! This was great. And the second blog of the night I've read from you so far. I kept going back to the other tab checking my ancestry report as I read. Had no idea. Still does not explain how much of my NW European is missing. Even at up to 12%. But I was wondering where all my other regions that other tests show were hiding. Sure enough, they are there at less than 1%. And even one I've never gotten anywhere before, European Jewish is 0-2%. Now, that is intriguing!

    ReplyDelete
  8. Ancestry lists me as 38% Scandinavian (range 18-59%)... FTDNA originally listed me as 32% Scandinavian... but soon after they updated their sampling or process and changed it to 8% Scandinavian. Now, I do not doubt that some of my deeper ancestry likely came from Scandinavia as they settled parts of northern Europe... but so far I have not found a single recent (going back to 1600s) Scandinavian ancestor... But I wonder about the sharpe difference between the new 8% from FTDNA and the 18-59% range from Ancestry. To be honest the newer FTDNA ties more reasonably with what I know about my tree... a dull 73% West and Central Europe. I have never been comfortable with the concept of ethnicity in dna testing... what time frame are we talking about... since peoples move about...

    ReplyDelete
  9. Thank you for sharing this amazing information. Very helpful. Have to wonder...For somebody who has taken multiple tests, each with a different company- what if the ethnicities generally overlap but have different percentages? How might you find the "truest" percentage? Also, suppose the results overlap except one test adds on an extra ethnicity or two. In my case, WeGene added on a few (like Mayan) that other companies didn't pick up. Could this be due to the fact WeGene has a better non-European reference population (and the results could be considered accurate)?

    ReplyDelete
    Replies
    1. Hi commentor,
      You can only find your "true" ethnicities by connecting with your DNA relatives and exploring their genealogical pedigrees. Then you can compare what is known about that relative's biogeography and history to your admixture profiles. You should know that an admixture test is not designed to identify a specific tribe or ethnicity but does show your genomic similarity to specially chosen global populations -- none of which you may be actually related to. Infer if you must but never conclude.

      Delete
  10. When I look at my ethnicity regions, only those regions with high and low confidence rates expand into a range. The others, such as native american, have a gray for in front of them. I am unable to expand and see an actual range.
    Also, what do the numbers (not percentages) in the circles mean? For example, native american (which is only dispayed under "show all 150+ regions") has a number 30 next to it.
    I was also perturbed by the fact that my kit said 37% ireland, scotland, and wales. There's no depth to that result, no specifics such as 20% irish, 17% Scottish. I expected much more clarity for a DNA test that costs roughly $100.

    ReplyDelete
  11. When I look at my ethnicity regions, only those regions with high and low confidence rates expand into a range. The others, such as native american, have a gray for in front of them. I am unable to expand and see an actual range.
    Also, what do the numbers (not percentages) in the circles mean? For example, native american (which is only dispayed under "show all 150+ regions") has a number 30 next to it.
    I was also perturbed by the fact that my kit said 37% ireland, scotland, and wales. There's no depth to that result, no specifics such as 20% irish, 17% Scottish. I expected much more clarity for a DNA test that costs roughly $100.

    ReplyDelete
  12. This is so insightful! Thank you. I'm going to do it!

    ReplyDelete