Thursday, February 21, 2019

Y-DNA: Big Y test resolves STRs and convergence


The Big Y test can resolve a Y-DNA problem when STRs alone cannot tell you what you need to answer your genealogical questions. 

In this blog post we will start with the standard Y-DNA testing advice, then examine how your strategy might have to change if your results are not showing matches to others with your surname. 

This particular case will show how you might see false Y-DNA matches because of a process called convergence.

If you are new to Y-DNA testing, please read this post about STRs, SNPs, and haplogroups. It will open in a new window so that you won't lose your place here.


Standard Y-DNA testing advice


Here's what generally true about STRs and SNPs:

1. Start with STR testing. If you have too many, or not enough matches, upgrade the number of STRs to narrow down your list of matches, find new matches, and better determine your Y-DNA relationships.

2. Use SNP testing to trace your family further in time. SNPs are primarily for older genealogical relationships, but can be brought into the genealogical time frame.  Because SNPs generally occur less frequently than STRs, use STRs to refine the relationship.


Our Case Study:
More STRs do not always mean fewer matches

I initially tested a Mullins cousin with a 37-marker STR test from Family Tree DNA. He is a descendant of James Mullins who was first located in Rutherford County, North Carolina. James was listed in earlier census records as James McMullins and later as James Mullins. 

Over the years, I gradually upgraded the Y-DNA tests of Mr. Mullins, the descendant of James. His Y-DNA results take what is generally accepted about SNPs vs. STRs and turns it on its head.



37-marker STR test: Lots of matches


At 37 markers, Mullins had an astonishing 1804 matches:


37-marker STR matches
37-marker STR test: 1804 matches

Notice that there are a wide variety of surnames in the match list. This is obviously not caused by the usual "non-paternity event" or NPE, which means that one of the ancestors was not the natural-born son of the man who raised him. This many surnames and the large number of matches is due to convergence.


What is convergence?

Convergence in DNA is when mutations make it appear that two people are more closely related than they really are. 

Let's see an example of this using two ancestral lines that we will call A and B. We will examine only one marker. One of your ancestors, Mr. A, had a value of 19 at DYS570. An unrelated man, Mr. B, who lived at the same time as your ancestor, had a value of 16 at that marker.  

In a more recent generation of the A family, the 19 mutated to 18. In an even more recent generation, the 18 further mutated to a 17.  

In the meantime, in the B family ancestral line, the value at DYS570 only mutated once, from a 16 to a 17. 

Today the descendants in the A and B family both have a 17 at DYS570. At that one marker the two families now appear to be more closely related than they really are. They share different surnames, yet have identical values at that marker. Again, this is due to convergence, not a non-paternity event.


Finding matching surnames

As we saw above, Mullins had lots of matches, but we didn't immediately see anybody named Mullins or any variation of that name. We can search by surname to find a specific name. 

In the Y-DNA Matches, you will see a section to filter your matches. In the the Filter Matches section, I entered the first few letters of the surname McMullins, which is a variation of the surname Mullins.


search Y-DNA by surname
Filter by surname


Out of the previous 1804 matches, only one man has this surname. He is a genetic distance of 4 at 37 markers. He has a family tree as indicated by the family tree symbol under his name.

Family Tree symbol


His ancestors are from County Cavan, Ireland.

Now we will filter the matches by just "Mul" to find any variations of Mullins, Mullens, etc.


Filter DNA matches by surname
One Mullins match


There was only one Mullins, and again, he has a family tree. His ancestor is from Rutherford County, North Carolina. This looks promising because our ancestor James Mullins also lived in that county. 

Notice also that next to the family tree symbol you will see what tests this man has taken at Family Tree DNA. This Mullins man has taken the 37-marker Y-DNA and the Family Finder tests.

Because neither Mr. McMullin or Mr. Mullins have tested more than 37 markers, I will not see either of these men in a match list at 67 markers. However, I expect that if I order 67 markers, I will see a more manageable list of matches than a list of 1804 men. 

How many fewer matches will I see? Further, will ordering 67 markers show new Mullins matches that do not appear in the 37-marker list? 

I definitely wanted to find out, so I ordered an upgrade to 67 markers. The results shocked me.


67-marker STR test: Even more matches


After ordering 67 markers, the number of matches went up, not down as we would normally expect. I now saw 2631 matches with all kinds of surnames.


67 marker match list
67 markers: 2631 matches



Filtering by surname, I find four new McMullin men. 


McMullin results

Why didn't they show up in the 37-marker results? The answer has to do with the criteria used by Family Tree DNA to determine a match. You can find an explanation of what FTDNA considers to be a relevant match here:  
https://www.familytreedna.com/learn/general/what-is-a-relevant-match/

This tells us that any matches at the 37-marker level must have a genetic distance of four or fewer. At 67 markers, the match must have a genetic distance of seven or fewer. 

So, if a person was a genetic distance of 7 at 37 markers, he would not show up as a match. But if no additional mutations occurred at the 38-67 marker level, he would show up as a match at 67 markers. 

It is very useful for people to join surname, haplogroup, and other projects so that we can see the actual mutations and where they occurred.

Using the same 67-marker match list, we will now look for surnames that start with "mul." We again see several new matches that were not on the 37-marker match list.


Muatching surnames beginning with M U L
"Mul" results


Now I was so curious to know what would happen at 111 markers that I upgraded again.



111-marker STR test proves that SNP testing is necessary



At 111 markers, the number of matches went down to 195. This is partially due to the fact that far fewer people ordered testing at this level.


List of matches at 111 markers
111 markers: 195 matches


66 of the 195 men had taken the Big Y-500 test [I had to count them], but their haplogroups were very different. Here's just a sample:


STR match but not SNP match
Different haplogroups in men who ordered Big Y


Time to change strategy


Normally, we encourage our matches to upgrade their STRs to help with finding common ancestry. But in haplogroups with high levels of convergence, upgrading STRs may not provide any assistance. 

111 STRs had not helped with finding a common Mullins ancestor, and the only thing that will prove relationships in this case is SNP testing. SNPs do not mutate back and forth the way STRs do, so SNPs are much more stable.  

It was pretty obvious to me by looking at the various haplogroups that Mr. Mullins belonged somewhere within haplogroup R-M222 which is known for large numbers of matches due to convergence. 

I was not going to bother with ordering a single SNP, or even a SNP Pack, to confirm this because what I really wanted to find were modern SNPs that could bring me into the genealogical time period. 

If I ordered the Big Y-500 [recently renamed the Big Y-700] test, how many of these men would be real matches? We're about to find out.



Examining the Big Y test


The initial results of the Big Y-500 test showed a terminal haplogroup of R-FGC57769 with four matches:


Big Y matching tab
Big Y Matching tab


The Unnamed Variants tab showed that Mr. Mullins had 10 variants that had not yet been given SNP names.

Big Y Unnamed Variants
Big Y Unnamed Variants tab


After the initial results are in, Family Tree DNA does a manual review to check for any new SNPs that have not yet been named. This usually occurs within a few weeks. 

After the manual review, there was only one match. These two men formed a new haplogroup, R-BY66397.


Big Y matches
New Big Y Matching results


After FTDNA's manual review, you may want to download your results. You can store them on your computer and transfer them to other databases. 

There is a blue Download Raw Data link. Be sure to request the BAM file.


Big Y Block Tree


You can see more detail about how the haplogroup changed by clicking on the Big Y Block Tree. You can access the Block Tree in the Big Y section of your homepage:


Big Y Block Tree
Click on Block Tree


Once you have clicked on Block Tree, you will be taken to your position in the tree. You can easily move up and down the tree and see details about various levels of the tree. 

Below we are seeing the position on the tree for Mr. Mullins.  

In Haplogroup R-FGC57769 there are currently a total of five men: Mr. Mullins (not shown because these are his matches), Mr. O'Brien, Mr. Martin, Mr. Carr, and Mr. Herberg. 

On the left, Martin and Carr are in their own haplogroup named R-FGC57762. They share three named SNPs: FGC57762, FGC57770, and FGC57771. They also have an average of five private variants each. 

In the middle we see the newly-formed haplogroup R-BY66397. This is the haplogroup of Mr. Mullins. We see his one match in this group. The tree shows that below R-BY66397 there are an average of eight private variants between Mr. Mullins and his match Mr. O'Brien. 

Mr. Herberg, on the right, currently has no matches below Haplogroup R-FGC57769. When he does, he and his match will form a new haplogroup.


Big Y Block Tree
Big Y Block Tree


Which position did Mullins and O'Brien share?


Here is the list of unnamed variants after the manual review:


New Unnamed Variants

There are now nine unnamed variants. The variant 7761527 is missing from the former list. This means it is the variant shared with Mr. O'Brien. Variant 7761527 was given the SNP name BY66397, and the new haplogroup R-BY66397 was formed. 

We can verify this by going to YBrowse and entering 7761517 in the search box. The results are shown below:


YBrowse
Position 7761527 at YBrowse



This verifies that the previously unnamed variant 7761427 has been named BY66397.



How closely are Mr. Mullins and Mr. O'Brien related?


Mr. Mullins has nine variants that are not shared with Mr. O'Brien. This indicates that their relationship is not recent. 

While SNP dating is not precise, it appears that the common ancestor of these two men lived at least 1000 years ago.


Filtering the STR lists of matches

Once you have taken a Big Y test, your STR lists of matches will have a new column called Big Y STR Differences. You will also see a new filter option to display only matches who have taken the Big Y test:

STR match list with Big Y
Big Y STR Differences column


Notice above that the Mullins list of 111-marker matches is now at 208 matches because more people have now taken the test. 

When we filter the matches by only those people who have taken the Big Y test, we see the following:

Big Y testees in STR results
Show only men who have taken Big Y

Notice that 73 of his 208 matches have taken the Big Y test. The closest match is a genetic distance of 7 at 111 markers. 

None of these men show up as a match in the Big Y match list of Mr. Mullins. Even though they are showing up as STR matches, they all belong in different haplogroups. None of these men is related to Mr. Mullins within at least a thousand years.

We can filter the list at each level of matching. Here is the filtered list at 67-markers. 376 men at this level have taken the Big Y test. We know that none of them is a match to Mr. Mullins because they do not appear on his Big Y match list.

Close STR matches with different haplogroups

In the above list, we see very interesting results. The first man on the list, McConnell, is only a genetic distance of one at 67 markers. 

There are other men here at a genetic distance of only two or three. This usually indicates that these men are closely related. 

However, in this case all the men have taken the Big Y test. Their haplogroups are not estimated; they are confirmed by SNP testing.  None of them is related to Mr. Mullins within the genealogical time period.

Finally, we will filter the 37-marker match list by men who have taken the Big Y test and whose surname begins with the letters "mul":

STR matches filtered by Big Y and surname
Filter STR results by surname and Big Y 

There are no Mullins matches, only a man named Mullican. As we can tell by his confirmed haplogroup, Mr. Mullican is not related to Mr. Mullins.


What did SNP testing tell us?


We now know that STR testing, even at 111 markers, may not be enough. It is definitely not enough in haplogroups with high levels of convergence. 

It is fascinating that the SNP results of Mr. Mullins does not match a single one of the hundreds of men who appear on his lists of STR matches who have also taken the Big Y test. 

Mr. O'Brien, who is his only Big Y match, does not appear on a Mullins STR match list at any level. SNPs will be the only way to determine if someone is related by Y-DNA to this Mullins line.


What do we do now?

At the current time, Mr. Mullins has nine private variants in his Big Y results. Each of these variants occurred somewhere in his Mullins line, but we don't yet know the order in which they occurred or in which ancestor each mutation occurred. 

We can find out some of that by testing more Mullins cousins. So far nobody else shares any of these private variants. We need to find someone who does so that we can find out more about the Mullins ancestry.

Looking through the STR match lists it is possible that of the thousands of STR matches, one of them might actually be relevant.  It is the man appears on the 37-marker match list and whose ancestor is Spencer Mullins. He is mismatching by four alleles at 37 markers, and that does not appear to be a close match. We cannot tell without examining the exact locations of the mutations. 

However, his ancestor Spencer Mullins appears to be the son of a William Mullins who lived in Rutherford County, North Carolina, at the same time as James Mullins lived there. William was about the same age as James. These two men could be brothers or another close relationship.  

STRs indicate that these men mismatch on four out of 37 markers, and even closer matches are not related. So this Mullins man could be just another convergence match. 

The only way to find out is to order the new Big Y-700 test for this man. I need to contact him to see if he agrees. If he is a genuine genealogical match, as I suspect, he will share at least one of the unnamed variants. 

The more unnamed variants the two men share, the closer they are related. If they share one or more of the currently unnamed variants, the two men will form a new haplogroup under R-BY66397.

Testing this potential Mullins match is only the beginning, but it can be a big step forward in tracing the Mullins ancestry.


What are some of the things you can do with the Big Y 
to find more about your paternal ancestors?

  • Examine your list of matches first. If your results have just arrived, they may change after a manual review. You may want to contact your matches to see if you can determine how you are related.
  • See how many unnamed variants you have. This can help determine how closely related you are to your matches.
  • Examine the new Block Tree to see where you fit in and to find more information about your more distant matches.
  • Be sure you add a family tree to your results. Your family tree should at least contain information about your paternal line.
  • Join surname and haplogroup projects. This allows you to compare your STR results and will help encourage potential matches to upgrade their STR and SNP results.
  • Go back to your list of STR matches and see how many of these people have ordered the Big Y test.
  • Encourage any matches to whom you think you may be related to take the Big Y test. 
  • Search through public family trees to find other possible male relatives for Y-DNA testing.
  • Test closer male family members to determine in which ancestor each mutation occurred and to find your true terminal haplogroup. 
  • Consider transferring your results to other websites to get further evaluation and to find even more matches. This step will be increasingly important now that the price of full genome sequencing has dropped significantly. See websites, such as yfull.com and fullgenomes.com, that accept transfers from multiple companies.


What's next?


We have just examined the Big Y results of a man who has no matches in the genealogical time period and have determined our next step. 

In another post I used Big Y results to prove the identity and European origins of a different ancestor. See The identity of jacob Bertschinger solved with Y-DNA. Ahhh, Y-DNA testing. I'm loving it!