European Genome-Phenome Archive

File Quality

File InformationEGAF00007988834

File Data

Base Coverage Distribution

This chart represents the base coverage distribution along the reference file. Y-axis represents the number of times a position in the reference file is covered. The x-axis represents the range of the values for the coverage.

Data is represented in a log scale to minimise the variability. A high peak in the beginning (low coverage) and a curve descending is expected.

4 459 2652 899 5572 400 8512 133 7461 975 3971 861 4261 753 8481 679 1361 603 6421 550 5831 500 8201 458 9671 434 9091 410 7981 378 8121 350 7571 344 0811 337 2251 333 4451 333 3681 337 9511 353 7071 376 0661 401 3541 444 0071 481 8941 530 6811 581 4941 647 3611 722 8791 797 0741 877 0091 964 8192 063 1272 164 2922 284 1102 415 2982 557 3382 721 2982 918 4083 136 1833 390 2583 705 5124 071 0884 521 4025 049 2905 677 1796 420 0657 310 0908 352 8599 559 58910 965 92612 578 92914 427 65816 546 83518 930 34021 577 87824 528 51327 780 11731 335 48235 158 67939 258 12043 558 40948 039 23952 657 67457 344 92862 039 47166 645 24471 112 32775 309 31779 181 06782 638 22685 617 78388 054 39989 933 07191 156 94991 748 08891 684 30190 969 63489 626 38287 652 90485 116 43782 112 88978 660 71774 879 42470 827 19566 549 51362 146 49857 663 52853 210 94148 808 03844 509 19240 395 10336 461 31332 748 42329 268 52526 051 82423 108 85020 405 85917 954 18215 743 78413 791 16012 040 52510 488 6339 133 8287 936 4206 891 7875 993 2525 213 7364 543 8453 964 8653 471 7613 051 4372 680 5172 366 2212 096 5221 863 8261 666 4521 493 3181 342 8821 215 8131 101 8511 003 308919 076845 764780 573724 015673 958628 067587 080550 863519 460490 297465 000442 539421 902403 122384 841366 902352 991339 930325 722314 132303 665293 524285 835276 862268 178259 749250 995245 014237 422234 190226 765220 976215 274210 395206 395202 855197 359192 852189 806185 672181 635176 558173 302169 884165 101162 951159 293157 063153 645150 485147 291143 892141 421138 336136 021133 052129 727128 045126 002124 428122 391119 808117 630116 224114 680113 139110 348107 891106 091104 160102 462101 58499 77397 84096 33494 98493 97091 73991 81889 30388 26286 66285 16583 56882 18682 00180 16178 89476 22675 39173 95373 29772 13271 61369 91369 56668 48767 57766 80966 46865 23564 40663 04361 90660 76761 13460 11558 60857 72557 56456 43455 46654 99354 94654 04452 97352 49052 11851 12950 71349 92849 77149 68748 41248 16347 31946 95646 51045 20844 77943 91943 34942 51142 31441 85141 14341 15039 85039 51338 91138 63238 22137 21437 04736 49436 32035 99135 38034 70934 66734 54934 33633 73233 16232 75632 18632 26631 31531 32030 60530 52730 38029 98929 88329 53929 54628 94828 34328 34427 75628 02126 85526 87226 27325 90925 77526 08325 60325 07724 71824 55824 24723 89023 57023 26023 24723 38922 60822 89822 55122 38022 25222 38222 19122 06021 51721 57821 54421 13720 74320 57520 54320 67720 60120 54920 26620 43819 92419 90419 53419 32619 11319 22718 81418 80418 77318 58618 47518 43218 33718 35517 74117 92117 56117 23817 66617 29317 27717 16516 99116 64416 55716 28215 93316 03615 55415 68415 68815 77315 80015 27714 80815 11114 85614 75414 67914 54114 57614 59414 33414 41214 33114 12513 85313 67713 73813 78713 78313 50913 24812 96913 20812 95313 04913 01512 72912 93512 54612 76312 59712 47912 33212 34012 06611 86711 81711 91511 78311 81411 35111 31911 25411 22211 05110 93510 83711 00810 85910 58610 44410 29610 37410 16310 24510 28810 24110 25110 2719 99610 1539 9079 7979 5849 4479 5639 7159 6389 3289 2019 2889 3709 4259 1748 9938 9748 9349 1888 9688 9128 8158 7288 4548 7878 5938 3988 7188 4588 2318 2638 1578 3468 1438 1718 0638 1037 8757 9937 8717 9387 7187 8897 9737 9247 9427 6877 8087 6257 6887 5057 5167 2967 2947 5057 4147 0957 0587 1167 2266 9367 0426 8776 8596 8546 7866 7226 5716 5836 4156 6206 3556 5376 3416 4596 5026 4416 3716 5916 4186 3416 3545 9976 0846 2746 1866 0425 9625 8785 9315 8845 8635 7705 6695 6645 6295 4395 6695 4895 4385 4015 2705 2355 2325 3435 0425 2535 2985 4555 5025 1655 1245 2475 2075 2805 1345 1065 2515 0815 0524 7805 0924 9684 7365 0044 7434 8664 7444 6394 7684 6894 6354 6564 4834 5024 5874 5504 7724 7174 6214 6224 6524 5644 4254 5904 5204 3394 2934 2494 3214 2794 1844 2084 1504 2094 2714 1094 1763 9814 2584 1094 0793 9974 1274 1194 0564 1484 0643 9764 0693 8923 9394 0934 0263 8013 7513 7363 8213 6983 7333 5983 6553 7133 7023 7753 5813 6223 6813 6233 6843 7653 6193 5733 5823 6593 6693 5473 5783 5823 7203 6013 5693 6193 6033 5423 4433 5723 5063 4093 5213 3843 4593 4833 3113 4043 4103 2813 2703 1543 2573 2613 2383 1603 1623 1253 3423 1263 0853 2253 0803 1483 0703 0923 1613 0353 1103 0253 0563 3372 9853 1293 0913 1793 0663 0653 0752 9802 9782 9432 8532 9672 9062 9622 9842 9093 1002 9213 0092 8612 9072 8742 8852 9002 9112 7742 8902 9622 8502 8612 8232 8492 7772 8552 8132 8412 8592 7342 6822 7862 7642 6932 6052 6822 6312 6352 5422 6132 6022 6682 6192 5872 5892 4922 5422 5612 5912 6362 5942 6042 5152 7242 6162 6212 6042 3362 4162 5332 3542 4732 3502 4142 4542 4452 3312 3132 3862 3492 3112 3982 4752 4762 2342 3152 2352 3822 3202 2492 3272 2382 2672 1982 3372 2552 2482 2012 1842 2032 2352 1462 2482 1662 2692 2312 2002 2832 1672 2572 0932 2712 1312 0292 1432 1222 1132 0912 2042 2562 2132 1852 1392 1072 1412 0922 1752 0802 0302 0301 9562 0302 0052 0011 9782 0092 1231 9972 0582 0121 9711 9942 1101 9882 0142 0012 0271 8751 8801 8962 0621 9531 9621 9511 8021 9861 9041 8861 9271 8781 8861 8531 9131 9061 8041 7881 8591 7661 7441 7931 7631 7891 7501 7461 7611 7721 7801 8401 8741 7491 7321 7751 7211 7481 7461 6501 7071 7431 7021 6801 7801 7671 8091 7561 8171 7311 7801 7371 6941 6381 7241 7061 7231 6971 6581 7451 7511 6541 6911 7211 9171 6081 6491 7201 6571 6441 6391 6241 6351 6351 5191 5801 5791 6051 6011 5821 6231 5681 6151 5551 5601 5401 5451 5801 5361 5391 5031 5141 4961 4681 5411 4381 5861 4901 5061 4471 4481 5251 4331 3841 4461 4561 5311 4701 4571 4511 4391 4281 4651 4591 3651 3651 4201 4031 4201 3931 4521 3961 3961 4561 4421 3941 3181 3951 4261 3811 3291 4031 3521 3891 3891 4191 3511 3841 3411 3611 4151 4101 3791 3881 3901 4241 4271 5141 3861 3991 2731 3981 4001 4071 3841 3601 3521 4781 3581 2811 2681 2581 2751 3361 2801 2571 2431 2471 2691 3561 2871 2671 2751 2551 2301 3101 2831 2401 2391 2591 3481 2391 2301 1991 3351 2281 2161 2001 2491 2501 2611 1901 2601 2681 1701 2551 238 554100200300400500600700800900>1000Coverage value2k10k20k100k200k1M2M10M20M# Bases

Base Quality

The base quality distribution shows the Phred quality scores describing the probability that a nucleotide has been incorrectly assigned; e.g. an error in the sequencing. Specifically, Q=-log10(P), where Q is the Phred score and P is the probability the nucleotide is wrong. The larger the score, the more confident we are in the base call. Depending on the sequencing technology, we can expect to see different distributions, but we expect to see a distribution skewed towards larger (more confident) scores; typically around 40.

7 848 46800000000007 006 754 548000000000000010 589 905 81000000000000214 610 318 11000000510152025303540Phred quality score0G20G40G60G80G100G120G140G160G180G200G# Bases

Mapped Reads

Number of reads successfully mapped (singletons & both mates) to the reference genome in the sample. Genetic variation, in particular structural variants, ensure that every sequenced sample is genetically different from the reference genome it was aligned to. Small differences against the reference are accepted, but, for more significant variation, the read can fail to be placed. Therefore, it is not expected that the mapped reads rate will hit 100%, but it is supposed to be high (usually >90%). Calculations are made taking into account the proportion of mapped reads against the total number of reads (mapped/mapped+unmapped).

99.8 %1 535 256 77999.8 %0.2 %

Both Mates Mapped

When working with paired-end sequencing, each DNA fragment is sequenced from both ends, creating two mates for each pair. This chart shows the fraction of reads in pairs where both of the mates successfully map to the reference genome. .

Notice that reads not mapped to the expected distance are also included as occurs with the proper pairs chart.

99.7 %1 532 979 98099.7 %0.3 %

Singletons

When working with paired-end sequencing, each DNA fragment is sequenced from both ends, creating two mates for each pair. If one mate in the pair successfully maps to the reference genome, but the other is unmapped, the mapped mate is a singleton. One way in which a singleton could occur would be if the sample has a large insertion compared with the reference genome; one mate can fall in sequence flanking the insertion and will be mapped, but the other falls in the inserted sequence and so cannot map to the reference genome. There are unlikely to many such structural variants in the sample, or sequencing errors that would cause a read not to be able to map. Consequently, the singleton rate is expected to be very low (<1%).

0.1 %2 276 7990.1 %99.9 %

Forward Strand

Fraction of reads mapped to the forward DNA strand. The general expectation is that the DNA library preparation step will generate DNA from the forward and reverse strands in equal amounts so after mapping the reads to the reference genome, approximately 50% of them will consequently map to the forward strand. Deviations from the 50%, may be due to problems with the library preparation step.

50 %768 923 26850 %50 %

Proper Pairs

A fragment consisting of two mates is called a proper pair if both mates map to the reference genome at the expected distance according to the reference genome. In particular, if the DNA library consists of fragments ~500 base pairs in length, and 100 base pair reads are sequenced from either end, the expectation would be that the two reads map to the reference genome separated by ~300 base pairs. If the sequenced sample contains large structural variants, e.g. a large insertion, where we expect the reads mapping with a large separation would be a signal for this variant, and the reads would not be considered as proper pairs. Based on the sequencing technology, there is also an expectation of the orientation of each read in the fragment.

The rate of proper pairs is expected to be well over 90%; even if the mapping rate itself is low as a result of bacterial contamination, for example.

97.7 %1 502 208 50297.7 %2.3 %

Duplicates

PCR duplicates are two (or more) reads that originate from the same DNA fragment. When sequencing data is analyzed, it is assumed that each observation (i.e. each read) is independent; an assumption that fails in the presence of duplicate reads. Typically, algorithms look for reads that map to the same genomic coordinate, and whose mates also map to identical genomic coordinates. It is important to note that as the sequencing depth increases, more reads are sampled from the DNA library, and consequently it is increasingly likely that duplicate reads will be sampled. As a result, the true duplicate rate is not independent of the depth, and they should both be considered when looking at the duplicate rate. Additionally, as the sequencing depth in increases, it is also increasingly likely that reads will map to the same location and be marked as duplicates, even when they are not. As such, as the sequencing depth approaches and surpasses the read length, the duplicate rate starts to become less indicative of problems.

6.9 %105 977 4276.9 %93.1 %

Mapping Quality Distribution

The mapping quality distribution shows the Phred quality scores describing the probability that a read does not map to the location that it has been assigned to (specifically, Q=-log10(P), where Q is the Phred score and P is the probability the read is in the wrong location). So the larger the score, the higher the quality of the mapping. Some scores have a specific meaning, e.g. a score of 0 means that the read could map equally to multiple places in the reference genome. The majority of reads should be well mapped, and so we expect to see this distribution heavily skewed to a significant value (typically around 60). It is not unusual to see some scores around zero. Reads originating from repetitive elements in the genome will plausibly map to multiple locations.

64 247 5581 376 9611 099 9701 791 1901 266 8331 277 1891 440 3522 187 9891 314 8981 165 238643 485559 832748 260803 446678 3991 072 572778 657851 325874 1171 045 0241 101 7321 208 7291 603 8851 162 6511 745 7572 691 031265 2115 854 634319 738293 010666 000520 498355 028680 540299 677277 429433 360516 849200 992883 96216 147 527826 275633 7041 276 9901 039 5711 724 4742 133 4541 583 9224 125 544380 543639 366471 112695 136333 734637 891582 413418 2361 812 448382 618883 6541 409 577 325051015202530354045505560Phred quality score0.1G0.2G0.3G0.4G0.5G0.6G0.7G0.8G0.9G1G1.1G1.2G1.3G1.4G# Reads

Mapped vs Unmapped

Stacked column chart for both mapped and unmapped reads along all chromosomes in the reference file. It is a similar representation as shown in the Mapped reads chart but for each chromosome. Although sequenced sample may be a female, it is possible to get reads in the Y chromosome as there are common regions in both chromosomes called pseudoautosomal regions (PAR1, PAR2).

Unmapped reads belonging to each chromosome are determined when the one mate/pair is aligned and the other is not. The unmapped read should have chromosome and POS identical to its mate. It could also be due when aligning is performed with bwa as it concatenates all the reference sequences together, so if a read hangs off of one reference onto another, it will be given the right chromosome and position, but it also be classified as unmapped.

99.86%99.84%99.86%99.87%99.86%99.86%99.86%99.86%99.85%99.85%99.85%99.86%99.87%99.85%99.85%99.85%99.84%99.86%99.83%99.84%99.83%99.85%99.66%99.54%0.14%0.16%0.14%0.13%0.14%0.14%0.14%0.14%0.15%0.15%0.15%0.14%0.13%0.15%0.15%0.15%0.16%0.14%0.17%0.16%0.17%0.15%0.34%0.46%123456789101112131415161718192021XYM0%10%20%30%40%50%60%70%80%90%100%mappedunmapped