Friday, December 15, 2017

Great updates to YFull


What is YFull?

YFull.com is an interpretation service for Y-chromosome Next-Generation Sequencing tests which include the Big Y test at Family Tree DNA and tests like the YElite 2.1 from Full Genomes Corp. YFull's interpretation includes identifying and rating new SNPs, showing hundreds of STRs that can be extracted from your file, placing you on the YFull tree, and much more. To see why you might want to use this service, see What are the benefits of YFull?

If you took the Big Y test from Family Tree DNA, for example, you will be able to submit your BAM file to YFull. Your BAM file is not automatically generated by FTDNA; you must request it first by going to your Big Y Results section and clicking Download Raw Data. Unfortunately, BAM files are not currently available because Family Tree DNA has been updating all Big Y results from the older hg19 (Build 37) to the more recent hg38 (Build 38) version of the Human Genome Reference Sequence. The BAM files should be available in early 2018.

In the meantime, YFull has been further improving the functionality of its service. These updates will greatly enhance our ability to understand our Next-Generation Sequencing results. Some of the new enhancements are shown in my post Big Changes to YFull. YFull released even more a few days ago, so let's examine them.



Great updates to YFull

On its Facebook page, YFull posted the following images of the latest updates.





Now we'll see this in practice.


Your YFull results screen

When your YFull results are returned, this will be your home screen. There are three main sections: Haplogroups and SNPs, STR results, and Novel SNP results. On the left of the screen are links to even more tools.



We will use the Novel SNPs section to examine the most recent YFull changes.


Novel SNPs

YFull's reporting of SNPs is quite a bit more extensive than Family Tree DNA's. In the Build 37 version of the Big Y, FTDNA identified a total of 26 novel SNPs in my brother's Big Y results. He had one match with a man named Cairns. 23 of the 26 new SNPs were shared with Cairns. These 23 SNPs were then named and placed on the haplotree, so they are not considered to be novel anymore. Three of the new SNPs were unnamed variants seen only in my brother's results.

When I submitted the BAM file to YFull for evaluation, YFull found a total of 48 SNPs which were categorized as best quality, acceptable quality, ambiguous quality, and low quality. Cairns had not yet submitted his results to YFull, and my brother has no other matches (yet!) in the YFull database. Therefore, all of the 48 newly-identified SNPs appear in my brother's novel SNPs section.


Low quality SNPs

When we go to the novel SNPs and check the "Low qual" tab, we can see that Yfull has identified two low-quality SNPs for this test. They are shown in the next image.



If we want to find more information about the second SNP on the list, we can click the yellow magnifier link.

Here is what the screen looked like before the latest update. We can see the hg19 and hg38 position numbers, how many times the SNP was read in the Big Y test, and what the reads showed.



In the above example, we can't tell if we are looking at information that came from the hg19 (Build 37) version of the test or the hg38 (Build 38) version, but we can see why this SNP was rated as low quality. The SNP was read only five times. Four times the test showed a T in this position, and one time it showed a G.

This SNP does not show up in the Big Y Results section of my brother's Family Tree DNA account because the results of five reads are not definitive enough to identify this as a valid SNP. However, YFull reports this SNP in the "Low qual" category because there is a possibility that this is a genuine SNP.

After the new update, here is how the same SNP position is shown:



The new screen has a red arrow pointing to the hg19 position number. This tells us that we are looking at information that was submitted under the former hg19 (Build 37) version. The black cursor is pointing to this SNP's region in the Y chromosome. The bar beneath it shows the exact position.

This position has not been identified in any database as a known SNP.


Best quality SNPs

Below are the "best quality" SNPs from the YFull report:



As you can see in the above screen, many of these "best quality" SNPs have now been named. Unfortunately, they have even been named more than once. The SNP names are shown next to the position numbers. YFull will identify a SNP as a known SNP in its database when the SNP has been shared between two or more testers at YFull. These "best quality" SNPs have not yet been seen in any other sample in the YFull database.

Here is one of the SNPs that is rated as "best quality."



It is located in the combBED (combined BED) area of the Y-chromosome where more stable SNPs are likely to be located. This position has been read consistently 103 times. It has been given two names: FGC65824 and BY20951. This SNP can be found in the ISOGG YBrowse database.


Solving a mystery with the new YFull updates

In my blog post The Big Y could be the best DNA test ever! I showed the process of evaluating SNP calls using Family Tree DNA's Big Y Chromosome Browser.

I could not find an answer to the mystery of SNP FGC46559 using this browser. I have copied the section about this SNP from my previous blog post to show why it was so confusing. The following describes how I searched for this SNP in the FTDNA system.

Using Family Tree DNA's Big Y Chromosome Browser

SNP FGC46559 appears on the Cairns and Thompson list of Non-Matching Variants. I searched for this SNP in my brother's account:



As you can see above, Thompson is not derived for SNP FGC46559. The reference and genotype are both listed as A which means that Thompson has the ancestral value and does not have a SNP here. Since it is a non-matching variant, Cairns must have this SNP.

This one is puzzling. When we click the SNP name and go to the Chromosome Browser, we see the following:



There are many more calls for this SNP that are not shown. You have to scroll up and down to see them all. This position was read a total of 76 times. The Reference value is A and is highlighted in red directly below the black arrow. All of the Genotype calls beneath it (in pink) are G except for that one blank space on the seventh line from the top. Yet the Genotype and Reference are both stated to be A.

What can we find about this SNP with the latest YFull update?

It doesn't appear to make any sense that the Big Y results show that my brother has an A at Position 19714591 when he clearly has a G.

At YFull, Position 19714591 appears in my brother's list of "acceptable quality" SNPs. We will click on the yellow magnifier to see what YFull reveals about this SNP.



This is one of the only SNPs in my brother's YFull account that has the blue YF icon. The blue YF icon indicates that this position is a known SNP in the YFull database.



At YFull we are looking at the hg19 version of the Big Y results. Here the number of reads is 94, and the hg38 Big Y results showed 76. This difference may be due to mapping to the new hg38 reference sequence. 

Notice that two different mutations have occurred in this position. SNP FGC46559 is a mutation from A to C at this position. This is rated as a high quality, five-star SNP in the YFull database. This SNP also appears in the YBrowse database.

My brother's results, however, indicated that at Position 19714591 he had a mutation from the ancestral value A to his derived value of G.  My brother's mutation from A to G has been named FGC65832.

Finally we can see why my brother's results are so puzzling in the Big Y Chromosome Browser. When we went to Position 19714591 in the Big Y browser, we were shown SNP FGC46559 which we now know is a mutation from A to C. Since my brother has the mutation A to G, he does not have SNP FGC46559. My brother has SNP FGC65832 which occurred at the same location. The Big Y browser does not yet show that two different SNPs have occurred at that position.

If we search for Position 19714591 in YBrowse, we will see the following screen showing the two SNPs FGC65832 and FGC46559.



But we can interpret this much easier in the new YFull browser because it clearly shows that FGC46559 is the mutation A to C and FGC65832 is the mutation A to G.

For this reason, I was particularly excited by the new YFull enhancements.


Verified by Sanger Sequencing

Position 12144810 has an orange check mark not shown in the other images.



When we hover the mouse over this orange check mark, we see that it indicates that the SNP has been verified by Sanger sequencing at the company YSeq. The term "verified" can be somewhat confusing. In this case, "verified" does not mean that it has been verified that my brother has this SNP.  All the term means here is that the SNP is available for Sanger sequencing at YSeq and that one person has been tested for this SNP. However, he did not have this SNP; it was reported negative.

In the example below, you can see that there is a green check mark next to one of the mutations.



The mutation G to A has been identified as SNP M241. It has been truly verified as a genuine SNP because 1008 people have tested for this SNP, and in 233 of these tests the results were positive. Because it has been proven that some people actually have a SNP at this location, the green check mark indicates that it is "verified."

A SNP is truly "verified" only when we can see a green check mark.


Why is Sanger Sequencing important?

Next-Generation Sequencing (the kind of sequencing used for the Big Y test) scans the Y chromosome and can find known SNPs as well as variants that have never before been discovered. But as we have seen, some positions are read many more times than others, and not all of the reads produce the same results.

With Sanger Sequencing, we can zero in on a particular position to see whether or not we actually have a variant at that location. We can verify the validity of many of our newly-discovered SNPs using Sanger Sequencing.

Furthermore, we can test any number of SNPs using this technology. For example, if someone wants to find out if he shares SNP FGC65832 with my brother, he can test that one SNP.  If he is wondering about several SNPs, he can test just those.


How do I make my newly-discovered SNPs available for Sanger Sequencing?

You can submit any named SNPs or unnamed variants to YSeq using the Wish A SNP option. Go to yseq.net and create an account, or log into your existing account.



Once you have an account, click Shopping Cart.




Now click "Wish A SNP" and follow the instructions on the next screen.



I will write more about the process of submitting SNPs to YSeq in an upcoming blog post.


I already have YFull results, what do I do next?


Examine your results with the new updates to YFull. If, for example, you do not see this icon for YBrowse, begin submitting your SNPs to YSeq.



When we don't see the icon for YFull, this means that we need to encourage more people to do Next-Generation Sequencing tests and to submit their results to YFull when the BAM files are available.



Thank you, YFull, for the great updates!


No comments: