European Genome-Phenome Archive

File Quality

File InformationEGAF00008060676

File Data

Base Coverage Distribution

This chart represents the base coverage distribution along the reference file. Y-axis represents the number of times a position in the reference file is covered. The x-axis represents the range of the values for the coverage.

Data is represented in a log scale to minimise the variability. A high peak in the beginning (low coverage) and a curve descending is expected.

3 450 1032 661 6382 285 0252 061 6201 907 8921 777 4021 692 4161 633 9341 572 1591 540 4511 515 0511 505 4751 506 1931 517 9191 557 9141 612 8641 698 2331 832 2812 025 2772 289 1582 656 0813 137 5343 751 4794 500 6095 378 2466 364 9427 461 0688 626 2019 803 88010 930 31611 983 55312 907 29113 713 86014 361 38314 844 07015 153 43515 347 35815 418 32715 418 95215 361 22315 326 91615 313 63315 376 66615 554 09315 877 89816 378 21417 115 97818 127 74319 417 95321 063 60123 090 98725 490 86428 308 98531 507 57435 103 54639 065 93043 296 44447 759 27652 333 47956 920 33261 483 03665 820 16469 862 90773 428 13676 482 42178 917 79280 674 97581 650 80781 986 56681 566 98480 437 69778 651 34176 253 49573 406 64670 159 22266 631 23562 854 02958 953 79955 050 59751 208 04047 456 96143 893 05340 542 38737 457 10034 612 68832 045 35829 735 24927 656 01525 813 63924 164 64622 690 43521 378 66320 177 95219 094 47518 075 66617 129 82316 236 55715 378 97614 552 62113 766 88612 990 98512 254 02011 525 56010 817 61610 145 3079 495 4398 868 6888 265 1357 706 3667 178 0296 681 5956 213 7375 778 6295 372 7755 002 5324 661 1224 346 2634 055 5593 780 3413 531 8503 296 3503 081 9782 874 1302 685 1872 507 0112 337 2612 176 4672 028 4341 886 1291 749 6111 619 3811 495 8341 382 6811 273 9401 174 7241 082 728991 062911 645834 630762 488698 302640 810587 912539 928497 289458 779425 222394 367366 070342 484320 093300 022283 926267 840254 468241 322230 454220 216211 473204 977198 798192 101185 297180 471175 895171 708167 035162 681159 512156 103152 370149 382145 435143 201140 468136 531133 484131 152128 774125 984123 444121 317119 042117 694114 971112 825110 170107 993105 446103 440101 809100 67199 13797 62195 11893 05090 74988 50287 44785 62384 64683 44082 72181 61579 75178 61177 62275 81974 86573 90873 54471 73770 96669 86968 82468 27466 73864 81364 48463 35862 50861 26660 23559 60458 88557 82557 23656 39755 79355 47654 82754 13153 43953 29752 15551 32951 46250 30550 01949 59748 59748 05147 60147 05546 37845 69245 04644 53144 35643 33643 02442 58641 98141 03641 11140 56340 21539 81939 34038 88738 90138 24437 96537 55937 10836 93236 28836 24135 46234 93134 69334 30033 91633 84433 60832 82432 85832 64732 23631 72531 07831 18630 83930 21729 66629 30829 45429 03328 89728 85928 43628 28327 98727 51327 48627 00826 68926 81926 73526 38626 40226 38526 20925 69525 68225 25225 04924 89424 76324 41324 61924 41724 22623 70524 04523 35323 35623 18123 00122 75822 78222 44922 00022 38722 45222 13821 95521 57621 40721 44721 53521 43521 31221 24021 12620 83920 79720 53720 25320 51119 98919 57219 36919 40319 17419 16018 93618 83018 44018 46518 35918 05317 67917 83717 71917 62317 46217 15517 08917 19316 85417 18116 57116 53416 33516 25215 77015 84315 70015 61315 63015 43415 37715 32615 61815 04315 05214 60214 58514 66214 65814 41314 42113 93714 03013 89813 93313 84313 76313 65813 15013 29912 93613 11212 90212 82512 64412 86512 62812 48712 32812 09312 16812 10811 93612 13611 97812 08811 89411 65411 47611 53311 33911 28511 13511 12611 06811 09210 90510 71910 89110 76310 89810 72910 55110 48910 25210 20510 19310 10710 23810 06210 0099 9759 9039 8029 7999 7179 4669 4929 4949 3779 3899 4199 3759 2329 1689 0169 1509 0268 9269 0008 9438 8468 8228 7658 5428 6658 7998 5508 4198 4258 2018 1708 1828 0687 9287 9517 9317 6447 6497 8697 9547 7887 7177 6487 3847 3797 4577 4337 3567 3887 1867 1617 4126 9707 2257 0667 1366 9826 9887 0987 0187 1436 9666 9396 9626 9006 8556 7106 5756 7566 3726 6146 6476 8086 3926 6046 4696 5276 4516 2606 3846 3986 1496 2726 2096 3016 1846 0556 1826 0485 8365 9045 8525 9095 9555 9805 8295 8725 9205 8625 7945 7315 6815 6825 5165 5345 6255 6305 5485 5045 4425 4955 4325 4645 4145 3635 3505 2085 1435 3545 2625 1215 0075 0945 1755 1445 1434 9904 9144 9474 9134 7244 8464 5934 5884 6964 6874 7564 6604 8264 7664 7044 5264 6354 6374 6134 5694 4854 5694 5944 5984 6874 5304 5714 3354 4914 4114 3504 2534 2464 2414 2244 1514 2704 1584 2484 0974 2464 2584 1564 0974 1504 2474 0914 0874 0283 9804 0143 9133 8233 7913 9083 8553 9073 8553 9093 7773 8023 7493 8743 7533 8083 7753 6853 5883 6893 6413 5813 6573 3903 3883 4673 4153 4143 4023 5613 4203 5093 3213 3333 3933 4063 3383 2373 3303 2113 2773 2453 2683 3693 1653 2633 2813 1953 1903 1693 1333 1743 2523 1893 1073 0203 1663 1193 1013 1443 1003 1923 0612 9592 9702 8873 0423 0132 9263 0242 9562 8722 9693 0082 9222 9733 0163 0722 9673 0052 9773 0283 0352 8932 8762 7632 7422 8832 8712 9392 9392 9522 9223 0402 9812 8742 9852 8632 8852 7582 8542 9332 9442 8812 8542 8322 9732 6722 8002 7952 7982 7722 6842 6352 7572 7742 6222 6112 6242 6332 6722 6602 6022 6002 6502 6542 6542 4542 4842 4972 4392 5002 3892 4642 5162 4822 5352 5002 4632 4862 4432 4952 3932 4072 4042 3522 5172 4552 4162 4092 4092 4282 3062 2002 3742 3702 3202 2712 4502 2582 2762 3692 3702 3772 3812 3232 2502 2902 2952 2742 2162 1642 3022 0902 1862 1192 1592 0772 1302 1292 1152 1602 0962 2072 2022 2242 1692 1852 0862 1252 2302 1712 2042 3022 1762 1012 2042 2262 1582 1692 1582 1172 2592 0872 1602 0012 0192 0552 2122 0912 0912 0602 0622 2332 0952 0992 0462 0472 0302 0372 0702 0692 1742 1162 0802 0542 1122 1132 0211 9831 9291 8891 8702 0481 9582 0091 8811 9171 8871 9741 9491 8791 9211 8731 8651 8251 9132 0161 8781 9281 8701 9091 9011 8581 9191 9251 9921 8241 7251 8381 7731 7431 8211 7891 7131 7761 7931 7801 7691 8071 7541 7891 7571 7601 7571 8241 7911 7931 7791 7881 8371 7641 7891 7321 7421 8091 7051 7111 6471 6861 6671 6801 6961 6611 5961 6821 6471 7121 7031 6661 6591 7411 7071 5991 7711 6581 6991 6221 6341 6341 6301 6401 6571 7321 6551 6451 6781 6811 6131 5891 5521 5681 5941 6101 5641 5921 7001 6111 6071 5461 5311 5051 5931 5971 6091 6701 6071 5171 5581 5441 5181 5581 4821 4951 5271 5051 4951 5731 4751 5201 4811 5811 4981 4201 4311 4561 3871 4701 4921 4371 4821 4111 4261 4931 4161 4051 4621 4291 4311 4281 4461 3831 4611 3501 4161 3881 4401 4471 4071 3521 4411 3751 2971 3911 3711 3951 3501 4361 3691 3851 3711 3541 411 077100200300400500600700800900>1000Coverage value2k10k20k100k200k1M2M10M20M# Bases

Base Quality

The base quality distribution shows the Phred quality scores describing the probability that a nucleotide has been incorrectly assigned; e.g. an error in the sequencing. Specifically, Q=-log10(P), where Q is the Phred score and P is the probability the nucleotide is wrong. The larger the score, the more confident we are in the base call. Depending on the sequencing technology, we can expect to see different distributions, but we expect to see a distribution skewed towards larger (more confident) scores; typically around 40.

3 739 51900000000007 547 082 978000000000000011 353 215 84300000000000195 439 568 64600000510152025303540Phred quality score0G20G40G60G80G100G120G140G160G180G# Bases

Mapped Reads

Number of reads successfully mapped (singletons & both mates) to the reference genome in the sample. Genetic variation, in particular structural variants, ensure that every sequenced sample is genetically different from the reference genome it was aligned to. Small differences against the reference are accepted, but, for more significant variation, the read can fail to be placed. Therefore, it is not expected that the mapped reads rate will hit 100%, but it is supposed to be high (usually >90%). Calculations are made taking into account the proportion of mapped reads against the total number of reads (mapped/mapped+unmapped).

99.8 %1 417 270 56699.8 %0.2 %

Both Mates Mapped

When working with paired-end sequencing, each DNA fragment is sequenced from both ends, creating two mates for each pair. This chart shows the fraction of reads in pairs where both of the mates successfully map to the reference genome. .

Notice that reads not mapped to the expected distance are also included as occurs with the proper pairs chart.

99.7 %1 415 461 01699.7 %0.3 %

Singletons

When working with paired-end sequencing, each DNA fragment is sequenced from both ends, creating two mates for each pair. If one mate in the pair successfully maps to the reference genome, but the other is unmapped, the mapped mate is a singleton. One way in which a singleton could occur would be if the sample has a large insertion compared with the reference genome; one mate can fall in sequence flanking the insertion and will be mapped, but the other falls in the inserted sequence and so cannot map to the reference genome. There are unlikely to many such structural variants in the sample, or sequencing errors that would cause a read not to be able to map. Consequently, the singleton rate is expected to be very low (<1%).

0.1 %1 809 5500.1 %99.9 %

Forward Strand

Fraction of reads mapped to the forward DNA strand. The general expectation is that the DNA library preparation step will generate DNA from the forward and reverse strands in equal amounts so after mapping the reads to the reference genome, approximately 50% of them will consequently map to the forward strand. Deviations from the 50%, may be due to problems with the library preparation step.

50 %709 747 04350 %50 %

Proper Pairs

A fragment consisting of two mates is called a proper pair if both mates map to the reference genome at the expected distance according to the reference genome. In particular, if the DNA library consists of fragments ~500 base pairs in length, and 100 base pair reads are sequenced from either end, the expectation would be that the two reads map to the reference genome separated by ~300 base pairs. If the sequenced sample contains large structural variants, e.g. a large insertion, where we expect the reads mapping with a large separation would be a signal for this variant, and the reads would not be considered as proper pairs. Based on the sequencing technology, there is also an expectation of the orientation of each read in the fragment.

The rate of proper pairs is expected to be well over 90%; even if the mapping rate itself is low as a result of bacterial contamination, for example.

98.1 %1 391 851 57898.1 %1.9 %

Duplicates

PCR duplicates are two (or more) reads that originate from the same DNA fragment. When sequencing data is analyzed, it is assumed that each observation (i.e. each read) is independent; an assumption that fails in the presence of duplicate reads. Typically, algorithms look for reads that map to the same genomic coordinate, and whose mates also map to identical genomic coordinates. It is important to note that as the sequencing depth increases, more reads are sampled from the DNA library, and consequently it is increasingly likely that duplicate reads will be sampled. As a result, the true duplicate rate is not independent of the depth, and they should both be considered when looking at the duplicate rate. Additionally, as the sequencing depth in increases, it is also increasingly likely that reads will map to the same location and be marked as duplicates, even when they are not. As such, as the sequencing depth approaches and surpasses the read length, the duplicate rate starts to become less indicative of problems.

6.4 %90 369 5416.4 %93.6 %

Mapping Quality Distribution

The mapping quality distribution shows the Phred quality scores describing the probability that a read does not map to the location that it has been assigned to (specifically, Q=-log10(P), where Q is the Phred score and P is the probability the read is in the wrong location). So the larger the score, the higher the quality of the mapping. Some scores have a specific meaning, e.g. a score of 0 means that the read could map equally to multiple places in the reference genome. The majority of reads should be well mapped, and so we expect to see this distribution heavily skewed to a significant value (typically around 60). It is not unusual to see some scores around zero. Reads originating from repetitive elements in the genome will plausibly map to multiple locations.

68 098 9231 500 6661 228 1691 772 0021 377 6311 407 0861 539 3472 026 7951 321 6321 205 484689 937595 684788 816866 516724 4751 165 589912 109981 031954 4701 102 8891 157 7361 224 2951 607 6721 149 7981 687 9802 537 783271 9315 631 960320 596294 251628 548512 362337 809671 794299 121274 085421 924508 499192 515853 21115 268 341773 699612 0751 205 606977 2001 661 1211 952 3241 710 7433 864 887360 417594 590440 395663 923318 754573 327562 539406 9421 744 617377 777857 9131 289 091 244051015202530354045505560Phred quality score0.1G0.2G0.3G0.4G0.5G0.6G0.7G0.8G0.9G1G1.1G1.2G# Reads

Mapped vs Unmapped

Stacked column chart for both mapped and unmapped reads along all chromosomes in the reference file. It is a similar representation as shown in the Mapped reads chart but for each chromosome. Although sequenced sample may be a female, it is possible to get reads in the Y chromosome as there are common regions in both chromosomes called pseudoautosomal regions (PAR1, PAR2).

Unmapped reads belonging to each chromosome are determined when the one mate/pair is aligned and the other is not. The unmapped read should have chromosome and POS identical to its mate. It could also be due when aligning is performed with bwa as it concatenates all the reference sequences together, so if a read hangs off of one reference onto another, it will be given the right chromosome and position, but it also be classified as unmapped.

99.88%99.87%99.89%99.89%99.89%99.88%99.88%99.88%99.87%99.87%99.87%99.88%99.89%99.88%99.87%99.86%99.85%99.89%99.82%99.86%99.84%99.87%99.85%99.32%0.12%0.13%0.11%0.11%0.11%0.12%0.12%0.12%0.13%0.13%0.13%0.12%0.11%0.12%0.13%0.14%0.15%0.11%0.18%0.14%0.16%0.13%0.15%0.68%123456789101112131415161718192021XYM0%10%20%30%40%50%60%70%80%90%100%mappedunmapped