Monday, October 14, 2019

Advantages of submitting to YFull


For Y-DNA testing, I have seen a lot of questions about how to evaluate Next-Generation Sequencing (NGS) tests such as Family Tree DNA's Big Y test, and similar tests at Full Genomes Corp, YSEQ, and extracted Y-DNA results of full genome tests from companies such as Dante Labs. If your Y-DNA company gave you good results, why would you want to download your results and upload them to a third-party site such as YFull? 

I previously wrote a post about the advantages of YFull, but that post was written when I took my original Big Y test. Please review What are the benefits of YFull?  The benefits are much more extensive now that YFull has added new tools and especially since I've also ordered the Big Y-700 test. So, what are the advantages of submitting to YFull?



What is YFull?


YFull is not a DNA testing company. YFull is an analysis and comparison service for Y-DNA Next Generation Sequencing and full mitochondrial DNA sequences. These interpretation and comparison services are more comprehensive than those found at any DNA testing company.



 Y-DNA analysis and comparison


In this post we will discuss Y-DNA only. Since most people at the present time have tested with Family Tree DNA, let's focus on a portion of the Big Y-700 results from Family Tree DNA and compare these to the YFull evaluation of the same results.

We will examine the Private Variants from a Big Y-700 kit at Family Tree DNA. Private Variants are those that have supposedly only been seen in your kit. Once they have been seen in more than one kit, you will find them in the Named Variants. However, note that I said "supposedly."  Sometimes positions appear in your Private Variants list that have been seen in other kits, but FTDNA hasn't yet discovered this. This will especially be the case if your results have just arrived and haven't been fully evaluated.

Log into the Family Tree DNA account and go to Big Y Results. There are three tabs for Named Variants, Private Variants, and Matching. Here are the first ten Private Variants for a Big Y-700 kit. There are a total of 19 private variants, but only the first ten are displayed below. They are listed by their hg38 position number:


Big Y-700 Private Variants


Let's look at the first SNP on the list, position 10053444. If we click on the blue link for that position, you will be taken to Family Tree DNA's chromosome browser and see the following:



Big Y chromosome browser


This SNP is considered by FTDNA to be high quality. The Reference Sequence has a G at this position, but this kit had a C. The chromosome browser shows us how many times the position has been read, but we have to count each line. No further information is available. Now let's see what we can discover about this same position at YFull.



Novel SNPs at YFull


In YFull, "Private variants" are called "Novel SNPs." Here again, the SNPs may not truly be novel.  Other companies may have found these variants, but YFull hasn't discovered them yet. To see a list of your novel SNPs, view the menu at the left side of the screen, and click on Novel SNPs. 


Click Novel SNPs

At YFull, SNPs are categorized by quality and divided into separate tabs. The tabs are Best qual [best quality], Acceptable qual, Ambiguous qual, Low qual, One reading, and Indels.

Below is the Best qual tab. You will see the SNPs listed by both their hg19 and hg38 positions. Notice that at FTDNA we only saw the hg38 position number. To see if we can find anything new about the position we examined at FTDNA, find 10053444 on the Best qual screen. 



List of Novel SNPs at YFull


You can see that 10053444 has been named FGC65817 by Full Genomes Corp, but we did not see that at FTDNA. The red check mark on the line means that this SNP is available for verification by Sanger testing at YSEQ. This is another piece of very useful information, as we will see later.

Now click View BAM on the right side of the screen to see several versions of the BAM viewer.  This one is similar to the one we saw at FTDNA:



One version of YFull chromosome browser


We can see even more by clicking the yellow magnifier icon on the left side of the screen:



View position in BAM


We will then see this information:



SNP information


The above screen tells us the kit number and haplogroup [on the first line, but much of it is erased in this image] and the Y-chromosome position numbers in hg19 and hg38. The red arrow next to the hg38 position shows that we are looking at a test that was aligned to hg38. This position was read 22 times in the Big Y test, and all the reads showed a C instead of the G found in the Reference Sequence. We also see that the SNP has been named FGC65817, it is available for testing at YSEQ, and it is listed in YBrowse.

The right side of the above image was cropped to make it more readable. However, another useful part of this screen is that you can not only see the position number, but you can see where this position is situated on the Y chromosome. This can help you determine how reliable this SNP may be. As shown below, position 10053444 is found in the Yp11.2 combBED region:



Y-chromosome regions


Notice that the fourth position below has been given the SNP name FT86640. SNPs beginning with FT are ones that were discovered by Family Tree DNA from the Big Y-700 test. Since it was not discovered in the previous Big Y test, let's see if this could be a valid SNP. Click the magnifier.



YFull Novel SNPs



This SNP had consistent results for 39 reads, so that's a good sign.



Search in BAM file


If we click the Ambiguous qual tab, we see the following. Let's examine the first listed position to see why it's on the ambiguous list. Again, click the magnifier on the left.



Evaluate an ambiguous SNP


On the next screen we see that this SNP was only read two times, so it's a less reliable SNP. 



A SNP with two reads


Family Tree DNA does not show this position in its list of Private Variants because it was only read two times. YFull considers the above SNP to be ambiguous for the same reason, but they do list it. If you want to know if any SNP is a valid one, you can verify it by ordering Sanger Sequencing at YSEQ for this position (along with any other doubtful SNPs). YFull does not indicate that the 9686527 SNP is available for testing at YSEQ, so go to yseq.net, and check to see if it's been added to their list of SNPs. If not, submit the position to YSEQ's Wish A SNP:



YSEQ Wish a SNP


On the next page, you will see full instructions for making your SNP available for testing.  Notice that the price is only one dollar.



YSeq Wish A SNP order


If this SNP is in a region that can be reliably tested, you will receive an email from YSEQ when your SNP is available. Because 9686527 was only read twice in the Big Y test, it is a questionable SNP, but once it's available for testing you can submit a DNA sample to YSEQ to verify that you actually have a novel SNP at this position. The ability to verify questionable SNPs is very important when comparing your results to someone else.



How do I download my Big Y results and submit them to YFull?


Log into your FTDNA account, and go to Big Y Results.  Click the blue Download Raw Data link at the upper right of the screen:



Download Raw Data


If you haven't already done so, you will first have to request the BAM file.  In a few days your BAM file will be ready. When it is, click Share BAM. then copy the link that appears.



Share BAM file


Now go to YFull.com and click Order Now:



Order YFull interpretation


The cost is $49, but you will not be charged until the results are ready.  If you have previously submitted another kit for the same person (for example, you previously submitted Big Y-500, and now you're submitting Big Y-700), add a comment to your order that "This is the same kit as [YFxxxxx]" and insert your old YFull ID number. You will get a new ID number for your Big Y-700 results, and it will be cross-referenced to your old ID number in the YFull tree. If you have ordered a mtDNA Full Sequence test for the same person, those results can also be uploaded at no additional charge.



Comparing STRs


To be sure you have the most complete results in your YFull account, you will want to upload a STR file as well as your BAM file. This is because the STRs are not as reliable from NGS tests as they are from Sanger testing. Family Tree DNA does Sanger testing to get the first 111 STR markers, but the BAM file you uploaded does not include the Sanger-tested results. Be sure to upload not only your BAM file (as shown above), but also upload a separate STR file. No matter what company you used for your NGS test, you can order a STR test from Family Tree DNA or from YSEQ.

If you ordered a STR test from FTDNA (it was included in any Big Y-700 results), log into your FTDNA account and go to your Y-STR Results page:



FTDNA Y-STR Results


Scroll all the way to the bottom of the page, and click the orange CSV button at the bottom right of the screen:



Download CSV file of STRs


In your YFull account, click the Upload STRs link.



Upload STRs


Depending on whether you received your STRs from YSEQ or FTDNA, on the next page either click Upload STRs - FTDNA or Upload STRs -YSEQ:



Upload FTDNA or YSEQ STRs


Notice above that there are two kits in the account. No CSV file of STRs has yet been uploaded for the first kit on the list. The FTDNA CSV file for the second kit has been loaded. The green check mark means that the CSV file passed the quality check. The red X is an option to delete the file. The Re-upload link is so that you can upload another CSV file if you get any additional STR results for the same kit from FTDNA. You can also upload a STR file from YSEQ for the same account. The STR uploads are free.



YFull Groups


With your STR CSV file uploaded, you will get better results from YFull Groups. "Groups" at YFull are similar to "Projects" at FTDNA. To join a YFull group, click Groups Y, then submit a request. You can submit a request to form a new group by sending an email to YFull. 



YFull Y-DNA Groups



Once you have joined a group, you can see the group results. Below are the first lines of the results from the R-L21 group.  Notice that only twelve markers are displayed. This is the default, but you can display 12, 25, 37, 67, 111 or ALL markers. Family Tree DNA only displays the first 111 markers in projects because the STRs from the Big Y tests are less reliable than the first 111 markers tested by Sanger testing.



YFull R-L21 STR results


Notice above that some of the STRs are missing or questionable in the test results. This means that the NGS test returned reliable results for some positions and not others. However, if these were results from Family Tree DNA all of the first twelve markers should have solid results because FTDNA does Sanger testing on the first 111 markers. In the above screen, the people who have missing or questionable results loaded their BAM file, but they did not load their STR file.

We can display all STRs by clicking the All view button as shown above.

The results below are some of the additional STRs received from the Big Y-700 test that are not compared at Family Tree DNA.



YFull Comparison of all STRs


In the above screen we can see that the NGS tests do not return all STRs for all people. This is one reason why FTDNA does not compare them, but if so, what is the benefit of showing them at YFull? One reason is that if you discover missing STRs, you may be able to order these from YSEQ and have them added to your account. We will examine this procedure later in this post.

Although FTDNA projects are generally much larger than YFull groups, YFull groups have a few major advantages. The first is your ability to contact anyone in the group whose results interest you. [For example, let's say that you see that someone else appears to be an STR match to you, but his test did not show results for many STRs that you have. Has he uploaded his STR file? Would the two of you be willing to do Sanger testing for any STRs that you both agree are particularly important?] Simply click the PM (Private Message) envelope icon next to the kit number in the group results.  A message screen will pop up:



Sending a private message in YFull


Notice that you are sending a message to Kit YF06227 who is recipient 6227. We do not know this person's name or email address. However, once you send the message it will appear in that person's YFull account, and they will have the option to respond. 

A second advantage is that YFull groups have results from not only Family Tree DNA, but also from other companies. This will be increasingly important as the price of Full Genome testing continues to come down. More and more people are extracting Y-DNA results from Full Genome tests and uploading the results to YFull. In addition, YFull group administrators can add results from scientific studies. See the "Add science sample" below.



Add science sample to YFull Group



A third benefit of YFull groups is the ability to search for SNPs to find everyone in the project who might share your SNP of interest. 



Search for SNPs


You can search by SNP name using the Y-Results tab or by position number using the Y-Browser tab.  Although the Group's scientific samples may not show in the STR table (because the STRs were not included in the scientific report), they will show up in the SNP search. 

Searching by "View Y-SNPs", people who have the SNP we're looking for will show up in the results list with a + sign like this:



Positive SNPs



Of course, if any SNP is one of particular interest, we could send a Private Message to another person who has this SNP.

The results will be less likely to show positive results when searching for something like an ambiguous novel SNP. Using the Y-Browser, we searched by position.  The results will show every sample in the Group.  Only the first three results are shown in the image below:



Search by Position number


These results show that the Reference Sequence had an A at position 14239619. The first ID had a N which means this position had no reads in the Big Y test. The second ID had an Error. The error is because the position had only two reads, and they were both T.  The third ID is a scientific sample. The sample had an A in this position. If we hover the mouse over the A, we can see that this position was read five times in the test.

The ability in Groups to search by SNPs is another tool that can verify SNPs and find variants that can identify recent lineages.



What if I have taken more than one test?


One really great thing about YFull is the ability to load several test results for the same kit. Your previous results do not disappear. For example, I ordered a Big Y test for a kit in 2017.  In 2019, I ordered the Big Y-700. The Big Y-700 is an entirely new test, not just an upgrade from the previous Big Y. YFull indicates multiple tests for the same kit on their tree. The new kit is displayed along with the old kit number.



Two YFull kits for the same person


When the Big Y-700 test was finished in 2019, Family Tree DNA removed the original Big Y test results and replaced them with the Big Y-700. With the results from the first test gone, there is no way to compare the two. But at YFull, the original test remains and can be compared to the linked Big Y-700 results. I have also ordered a Full Genome test from Dante Labs. When those results come in, I can compare all three tests. If we click Comparisons in the YFull menu, we can see the following Statistics tab that compares the original Big Y to the Big Y-700 test for this kit. The original Big Y was aligned to hg19 [this was before the Big Y-500 test], and the new results were aligned to hg38:



Big Y vs Big Y-700


The Novel SNPs tab below is especially fascinating.  Position by position we see the hg19 and hg38 position numbers, all names that have been given to each SNP, and the calls from the two tests. We can hover the cursor over any item to see an explanation. For example, hovering the cursor over the green 1 shows that this is a Best quality SNP.



Novel SNP comparison between Big Y test and Big Y-700 test


We want to examine one of these positions that was discovered in the Big Y-700 test but not in the original Big Y. Click on the yellow magnifier.



 New SNP discovered in Big Y-700


We can see that although this position was not discovered in the original Big Y, it was read 28 times in the Big Y-700 and has been given the SNP name FT85878.



More information about newly-discovered SNP


We can also compare the STRs from the two tests. The STR name is on the left, then the STR results from the first Big Y test for this kit, and finally the results for the same kit from the Big Y-700 test (the column on the far right). Notice that on this first screen the STR results are the same for both tests:



Compare Big Y and Big Y-700 STRs


However, as we move further down, we can see that they are not all the same:



Missing STRs in two versions of Big Y tests


In the above screen there were results for some markers in my original Big Y test [the second to last column], but no results for the same markers in the most recent Big Y-700 test. The same is true with the new test showing results for some markers that the older test did not. Without being able to compare the two tests at YFull, I would not know about the missing results at any position. But I what can I do about it?  I may be able to order a test for a specific STR at YSEQ.



YSEQ STRs


YSEQ can test a single STR or a panel of STRs. To order a single STR, click STRs in the menu on the left of the screen, then search to see if the STR is available for testing.



YSEQ STRs


We can order a test for the DYS518 STR that was discovered in the original Big Y test, but lost in the Big Y-700:



Order YSEQ STR 


Once you have received STR results from YSEQ, you can upload them to YFull and add them to the results you already uploaded. First click Upload STRs:



Upload STRs to YFull


Now click UPLOAD STRs-YSEQ:



Upload YSEQ STRs


The ability to combine STRs from FTDNA and from YSEQ is wonderful.



SUMMARY: Why should I upload to YFull? 
Here are 10 reasons


Although we did not cover all of the benefits of YFull including SNP matches, STR matches outside of projects, estimating the dates when SNPs occurred, and many others, here are ten benefits that we did discuss:

1. You will not be charged for your upload until the results are ready, and you can add mtDNA Full Sequence results and multiple STR files for the same person at no additional charge.
2. You can find additional information about your private variants including the names that have been given to this variant, the region of the Y-chromosome where it appears, whether it is available for Sanger testing at YSEQ, and more.
3. You can compare your results to people who tested at other companies.
4. You can compare your own results from different NGS or Full Genomes tests you've taken.
5. YFull Groups can display and compare all STRs (not just the first 111).
6. You can find SNP matches in YFull Groups, not for just your terminal SNP, but for any named or unnamed variant.
7. You can contact other people in your YFull Groups.
8. If you discover questionable SNPs or STRs in your NGS test, you can verify them at YSEQ and add them to your test results.  
9. If some SNPs or STRs did not appear in your test results at all, you can order new SNPs or STRs from FTDNA or YSEQ and add them to your results.
10. Your results will not be wiped out, no matter how many versions of the same test you take.

You get a new, more comprehensive interpretation of your data. The benefits increase as YFull adds more features and more people submit results. Please seriously consider adding your results to YFull.








Sunday, May 19, 2019

What unites people?


What unites people? Armies, gold, flags? STORIES. There's nothing in the world more powerful than a good story. Nothing can stop it. No enemy can defeat it. . . . [You] are our memory, the keeper of all our stories: the wars, weddings, births, massacres, famines, our triumphs, our defeats, our past. Who better to lead us into the future?

Tyrion Lannister

To family historians: Never forget how important your work is.

Thursday, February 21, 2019

Y-DNA: Big Y test resolves STRs and convergence


The Big Y test can resolve a Y-DNA problem when STRs alone cannot tell you what you need to answer your genealogical questions. In this blog post we will start with the standard Y-DNA testing advice, then examine how your strategy might have to change if your results are not showing matches to others with your surname. This particular case will show how you might see false Y-DNA matches because of a process called convergence.

If you are new to Y-DNA testing, please read this post about STRs, SNPs, and haplogroups. It will open in a new window so that you won't lose your place here.


Standard Y-DNA testing advice


Here's what generally true about STRs and SNPs:

1. Start with STR testing. If you have too many, or not enough matches, upgrade the number of STRs to narrow down your list of matches, find new matches, and better determine your Y-DNA relationships.

2. Use SNP testing to trace your family further in time. SNPs are primarily for older genealogical relationships, but can be brought into the genealogical time frame.  Because SNPs generally occur less frequently than STRs, use STRs to refine the relationship.


Our Case Study:
More STRs do not always mean fewer matches

I initially tested a Mullins cousin with a 37-marker STR test from Family Tree DNA. He is a descendant of James Mullins who was first located in Rutherford County, North Carolina. James was listed in earlier census records as James McMullins and later as James Mullins. 

Over the years, I gradually upgraded the Y-DNA tests of Mr. Mullins, the descendant of James. His Y-DNA results take what is generally accepted about SNPs vs. STRs and turns it on its head.



37-marker STR test: Lots of matches


At 37 markers, Mullins had an astonishing 1804 matches:


37-marker STR matches
37-marker STR test: 1804 matches

Notice that there are a wide variety of surnames in the match list. This is obviously not caused by the usual "non-paternity event" or NPE, which means that one of the ancestors was not the natural-born son of the man who raised him. This many surnames and the large number of matches is due to convergence.


What is convergence?

Convergence in DNA is when mutations make it appear that two people are more closely related than they really are. Let's see an example of this using two ancestral lines that we will call A and B. We will examine only one marker. One of your ancestors, Mr. A, had a value of 19 at DYS570. An unrelated man, Mr. B, who lived at the same time as your ancestor, had a value of 16 at that marker.  In a more recent generation of the A family, the 19 mutated to 18. In an even more recent generation, the 18 further mutated to a 17.  In the meantime, in the B family ancestral line, the value at DYS570 only mutated once, from a 16 to a 17. Today the descendants in the A and B family both have a 17 at DYS570. At that one marker the two families now appear to be more closely related than they really are. They share different surnames, yet have identical values at that marker. Again, this is due to convergence, not a non-paternity event.


Finding matching surnames

As we saw above, Mullins had lots of matches, but we didn't immediately see anybody named Mullins or any variation of that name. We can search by surname to find a specific name. In the Y-DNA Matches, you will see a section to filter your matches. In the the Filter Matches section, I entered the first few letters of the surname McMullins, which is a variation of the surname Mullins.


search Y-DNA by surname
Filter by surname


Out of the previous 1804 matches, only one man has this surname. He is a genetic distance of 4 at 37 markers. He has a family tree as indicated by the family tree symbol under his name.

Family Tree symbol


His ancestors are from County Cavan, Ireland.

Now we will filter the matches by just "Mul" to find any variations of Mullins, Mullens, etc.


Filter DNA matches by surname
One Mullins match


There was only one Mullins, and again, he has a family tree. His ancestor is from Rutherford County, North Carolina. This looks promising because our ancestor James Mullins also lived in that county. Notice also that next to the family tree symbol you will see what tests this man has taken at Family Tree DNA. This Mullins man has taken the 37-marker Y-DNA and the Family Finder tests.

Because neither Mr. McMullin or Mr. Mullins have tested more than 37 markers, I will not see either of these men in a match list at 67 markers. However, I expect that if I order 67 markers, I will see a more manageable list of matches than a list of 1804 men. How many fewer matches will I see? Further, will ordering 67 markers show new Mullins matches that do not appear in the 37-marker list? I definitely wanted to find out, so I ordered an upgrade to 67 markers. The results shocked me.


67-marker STR test: Even more matches


After ordering 67 markers, the number of matches went up, not down as we would normally expect. I now saw 2631 matches with all kinds of surnames.


67 marker match list
67 markers: 2631 matches



Filtering by surname, I find four new McMullin men. 


McMullin results

Why didn't they show up in the 37-marker results? The answer has to do with the criteria used by Family Tree DNA to determine a match. You can find an explanation of what FTDNA considers to be a relevant match here:  
https://www.familytreedna.com/learn/general/what-is-a-relevant-match/

This tells us that any matches at the 37-marker level must have a genetic distance of four or fewer. At 67 markers, the match must have a genetic distance of seven or fewer. So, if a person was a genetic distance of 7 at 37 markers, he would not show up as a match. But if no additional mutations occurred at the 38-67 marker level, he would show up as a match at 67 markers. It is very useful for people to join surname, haplogroup, and other projects so that we can see the actual mutations and where they occurred.

Using the same 67-marker match list, we will now look for surnames that start with "mul." We again see several new matches that were not on the 37-marker match list.


"Mul" results


Now I was so curious to know what would happen at 111 markers that I upgraded again.



111-marker STR test proves that SNP testing is necessary



At 111 markers, the number of matches went down to 195. This is partially due to the fact that far fewer people ordered testing at this level.


111 markers: 195 matches


66 of the 195 men had taken the Big Y-500 test [I had to count them], but their haplogroups were very different. Here's just a sample:


STR match but not SNP match
Different haplogroups in men who ordered Big Y


Time to change strategy


Normally, we encourage our matches to upgrade their STRs to help with finding common ancestry. But in haplogroups with high levels of convergence, upgrading STRs may not provide any assistance. 111 STRs had not helped with finding a common Mullins ancestor, and the only thing that will prove relationships in this case is SNP testing. SNPs do not mutate back and forth the way STRs do, so SNPs are much more stable.  

It was pretty obvious to me by looking at the various haplogroups that Mr. Mullins belonged somewhere within haplogroup R-M222 which is known for large numbers of matches due to convergence. I was not going to bother with ordering a single SNP, or even a SNP Pack, to confirm this because what I really wanted to find were modern SNPs that could bring me into the genealogical time period. 

If I ordered the Big Y-500 [recently renamed the Big Y-700] test, how many of these men would be real matches? We're about to find out.



Examining the Big Y test


The initial results of the Big Y-500 test showed a terminal haplogroup of R-FGC57769 with four matches:


Big Y matching tab
Big Y Matching tab


The Unnamed Variants tab showed that Mr. Mullins had 10 variants that had not yet been given SNP names.

Big Y Unnamed Variants
Big Y Unnamed Variants tab


After the initial results are in, Family Tree DNA does a manual review to check for any new SNPs that have not yet been named. This usually occurs within a few weeks. After the manual review, there was only one match. These two men formed a new haplogroup, R-BY66397.


Big Y matches
New Big Y Matching results


After FTDNA's manual review, you may want to download your results. You can store them on your computer and transfer them to other databases. There is a blue Download Raw Data link. Be sure to request the BAM file.


Big Y Block Tree


You can see more detail about how the haplogroup changed by clicking on the Big Y Block Tree. You can access the Block Tree in the Big Y section of your homepage:


Big Y Block Tree
Click on Block Tree


Once you have clicked on Block Tree, you will be taken to your position in the tree. You can easily move up and down the tree and see details about various levels of the tree. Below we are seeing the position on the tree for Mr. Mullins.  

In Haplogroup R-FGC57769 there are currently a total of five men: Mr. Mullins (not shown because these are his matches), Mr. O'Brien, Mr. Martin, Mr. Carr, and Mr. Herberg. 

On the left, Martin and Carr are in their own haplogroup named R-FGC57762. They share three named SNPs: FGC57762, FGC57770, and FGC57771. They also have an average of five private variants each. 

In the middle we see the newly-formed haplogroup R-BY66397. This is the haplogroup of Mr. Mullins. We see his one match in this group. The tree shows that below R-BY66397 there are an average of eight private variants between Mr. Mullins and his match Mr. O'Brien. 

Mr. Herberg, on the right, currently has no matches below Haplogroup R-FGC57769. When he does, he and his match will form a new haplogroup.


Big Y Block Tree
Big Y Block Tree


Which position did Mullins and O'Brien share?


Here is the list of unnamed variants after the manual review:


New Unnamed Variants

There are now nine unnamed variants. The variant 7761527 is missing from the former list. This means it is the variant shared with Mr. O'Brien. Variant 7761527 was given the SNP name BY66397, and the new haplogroup R-BY66397 was formed. We can verify this by going to YBrowse and entering 7761517 in the search box. The results are shown below:


YBrowse
Position 7761527 at YBrowse



This verifies that the previously unnamed variant 7761427 has been named BY66397.



How closely are Mr. Mullins and Mr. O'Brien related?


Mr. Mullins has nine variants that are not shared with Mr. O'Brien. This indicates that their relationship is not recent. While SNP dating is not precise, it appears that the common ancestor of these two men lived at least 1000 years ago.


Filtering the STR lists of matches

Once you have taken a Big Y test, your STR lists of matches will have a new column called Big Y STR Differences. You will also see a new filter option to display only matches who have taken the Big Y test:

STR match list with Big Y
Big Y STR Differences column


Notice above that the Mullins list of 111-marker matches is now at 208 matches because more people have now taken the test. When we filter the matches by only those people who have taken the Big Y test, we see the following:

Big Y testees in STR results
Show only men who have taken Big Y

Notice that 73 of his 208 matches have taken the Big Y test. The closest match is a genetic distance of 7 at 111 markers. None of these men show up as a match in the Big Y match list of Mr. Mullins. Even though they are showing up as STR matches, they all belong in different haplogroups. None of these men is related to Mr. Mullins within at least a thousand years.

We can filter the list at each level of matching. Here is the filtered list at 67-markers. 376 men at this level have taken the Big Y test. We know that none of them is a match to Mr. Mullins because they do not appear on his Big Y match list.

Close STR matches with different haplogroups

In the above list, we see very interesting results. The first man on the list, McConnell, is only a genetic distance of one at 67 markers. There are other men here at a genetic distance of only two or three. This usually indicates that these men are closely related. However, in this case all the men have taken the Big Y test. Their haplogroups are not estimated; they are confirmed by SNP testing.  None of them is related to Mr. Mullins within the genealogical time period.

Finally, we will filter the 37-marker match list by men who have taken the Big Y test and whose surname begins with the letters "mul":

STR matches filtered by Big Y and surname
Filter STR results by surname and Big Y 

There are no Mullins matches, only a man named Mullican. As we can tell by his confirmed haplogroup, Mr. Mullican is not related to Mr. Mullins.


What did SNP testing tell us?


We now know that STR testing, even at 111 markers, may not be enough. It is definitely not enough in haplogroups with high levels of convergence. It is fascinating that the SNP results of Mr. Mullins does not match a single one of the hundreds of men who appear on his lists of STR matches who have also taken the Big Y test. Mr. O'Brien, who is his only Big Y match, does not appear on a Mullins STR match list at any level. SNPs will be the only way to determine if someone is related by Y-DNA to this Mullins line.


What do we do now?

At the current time, Mr. Mullins has nine private variants in his Big Y results. Each of these variants occurred somewhere in his Mullins line, but we don't yet know the order in which they occurred or in which ancestor each mutation occurred. We can find out some of that by testing more Mullins cousins. So far nobody else shares any of these private variants. We need to find someone who does so that we can find out more about the Mullins ancestry.

Looking through the STR match lists it is possible that of the thousands of STR matches, one of them might actually be relevant.  It is the man appears on the 37-marker match list and whose ancestor is Spencer Mullins. He is mismatching by four alleles at 37 markers, and that does not appear to be a close match. We cannot tell without examining the exact locations of the mutations. However, his ancestor Spencer Mullins appears to be the son of a William Mullins who lived in Rutherford County, North Carolina, at the same time as James Mullins lived there. William was about the same age as James. These two men could be brothers or another close relationship.  

STRs indicate that these men mismatch on four out of 37 markers, and even closer matches are not related. So this Mullins man could be just another convergence match. The only way to find out is to order the new Big Y-700 test for this man. I need to contact him to see if he agrees. If he is a genuine genealogical match, as I suspect, he will share at least one of the unnamed variants. The more unnamed variants the two men share, the closer they are related. If they share one or more of the currently unnamed variants, the two men will form a new haplogroup under R-BY66397.

Testing this potential Mullins match is only the beginning, but it can be a big step forward in tracing the Mullins ancestry.


What are some of the things you can do with the Big Y 
to find more about your paternal ancestors?

  • Examine your list of matches first. If your results have just arrived, they may change after a manual review. You may want to contact your matches to see if you can determine how you are related.
  • See how many unnamed variants you have. This can help determine how closely related you are to your matches.
  • Examine the new Block Tree to see where you fit in and to find more information about your more distant matches.
  • Be sure you add a family tree to your results. Your family tree should at least contain information about your paternal line.
  • Join surname and haplogroup projects. This allows you to compare your STR results and will help encourage potential matches to upgrade their STR and SNP results.
  • Go back to your list of STR matches and see how many of these people have ordered the Big Y test.
  • Encourage any matches to whom you think you may be related to take the Big Y test. 
  • Search through public family trees to find other possible male relatives for Y-DNA testing.
  • Test closer male family members to determine in which ancestor each mutation occurred and to find your true terminal haplogroup. 
  • Consider transferring your results to other websites to get further evaluation and to find even more matches. This step will be increasingly important now that the price of full genome sequencing has dropped significantly. See websites, such as yfull.com and fullgenomes.com, that accept transfers from multiple companies.


What's next?


We have just examined the Big Y results of a man who has no matches in the genealogical time period and have determined our next step. We will next see a man who has one relevant Big Y match. We will find men who have a different surname who can help extend the family line. Ahhh, Y-DNA testing. I'm loving it!