European Genome-Phenome Archive

File Quality

File InformationEGAF00005283791

File Data

Base Coverage Distribution

This chart represents the base coverage distribution along the reference file. Y-axis represents the number of times a position in the reference file is covered. The x-axis represents the range of the values for the coverage.

Data is represented in a log scale to minimise the variability. A high peak in the beginning (low coverage) and a curve descending is expected.

359 449 689337 070 302139 221 618138 131 20164 127 44360 513 30131 979 14928 449 37016 974 59414 519 9539 698 5338 166 9816 002 4315 099 9614 059 9853 535 8022 992 6282 651 2132 344 8372 135 7561 939 1921 791 3421 660 5681 552 4731 457 1431 380 1731 316 3871 245 5561 195 5211 152 9591 104 5821 063 8161 021 904990 228956 330930 595904 654876 720853 847834 837813 132789 852774 198757 121737 852723 674709 059693 856680 876666 572654 724645 191632 730621 475610 701600 292592 176582 906575 661564 101556 345550 324543 660536 906529 843525 752519 294511 156506 472500 184494 523491 151482 823481 012475 789469 327468 447460 755456 009452 165448 869443 452442 043437 838433 072427 520424 851420 433415 475413 118410 784408 148405 978402 346398 034395 154391 936389 358387 270383 747380 752379 992376 356373 302370 456367 307366 278364 165362 123359 370357 450355 217351 707349 841346 068344 810341 660340 459336 563334 319333 637330 174329 057326 366323 981322 516319 075317 387315 976314 225310 972310 131308 773306 255303 733301 478301 017297 372294 849295 244292 839289 548287 343286 774285 613284 579281 048279 426277 450277 277275 655273 043271 397270 528268 456265 605264 312263 451261 149260 981259 727255 886255 281253 365251 832250 522248 124247 785245 795244 678243 644242 343241 028241 018239 825237 892235 053234 643234 817231 602230 947229 861227 411226 777226 560224 727223 386221 480220 786220 111218 456216 475216 402214 437213 870211 605210 147210 356208 866208 777206 287205 421204 765203 642202 393202 842200 817199 743198 242197 669195 923195 130194 761192 466191 936191 879192 121189 441188 874187 500186 546185 141185 957184 671183 994183 349182 043181 583180 548178 828178 077177 205175 582175 047175 027173 768172 860172 467171 162171 101169 529168 374166 805167 192165 943165 504164 474163 612162 687161 841161 584160 337159 061159 089157 253156 854156 169155 370153 298153 139153 102151 901151 033150 281150 810149 294148 863147 681146 890145 644146 145143 988143 704143 409142 971141 736140 709140 048139 261138 391138 078136 647135 669134 853133 750134 126133 527132 303130 560130 546129 747129 756127 678127 405127 821126 121125 541125 159123 409123 017122 261121 389120 660120 413119 600118 487116 953116 937116 329115 231114 466113 596112 762112 854111 914111 141110 549110 336109 081108 398107 843107 340106 294106 306104 532104 163104 059102 747102 401101 76599 913100 35299 48298 66898 29797 76396 87896 04795 68095 09094 18493 61992 97792 54092 13091 10890 58189 76289 85888 53687 98787 07486 50886 32185 25685 05784 21783 79083 70682 96082 17481 48481 79081 32780 57979 57679 57978 54577 68177 66376 92276 18475 64375 25074 53674 19273 78572 77472 18171 59471 42871 29770 61069 96369 33168 92868 27567 95967 23866 61866 43065 52464 97164 64464 23363 77162 89862 60062 19161 96661 14160 62059 99859 78958 88858 68958 37757 99957 13257 02856 97156 21355 72754 45854 89654 19953 85253 20552 48451 79551 17151 25950 87450 23249 35448 95848 92948 60547 95947 49347 29546 97146 63146 62046 01045 96345 12744 57044 35643 80943 37843 15442 74242 28742 40741 57341 78641 05140 73440 46640 33540 02939 67439 60139 15638 51038 49337 97737 96837 37537 00336 74436 48436 34036 30035 65035 25734 61434 46033 98833 77733 42533 64033 21533 05632 60732 30332 15331 50731 43830 92230 53330 16529 93729 78729 37729 50528 61828 32928 43027 94427 79727 52727 08727 16726 71826 75526 28225 97125 71525 41025 52125 05324 92924 70224 31824 04023 99023 71923 50623 20223 12823 17022 79422 49822 04721 66721 47621 11221 14421 26520 82920 59720 42920 06920 04019 97619 83519 56719 61619 05418 90018 97718 64018 43918 48418 00618 12618 13417 54017 36717 27316 91317 00316 59016 29916 12016 09816 00215 43515 61115 22314 99914 84814 72814 70514 42914 12713 98113 72913 84713 49013 66113 50213 22513 07312 82512 78212 53012 47212 63812 24512 27412 17711 91911 73011 82811 58611 53811 37511 22811 16711 05010 89610 59510 67110 43710 56110 69510 34810 17210 0099 9879 7619 8339 6509 5189 2319 3639 3749 1189 2968 9928 8458 6828 5878 7158 5458 1608 1738 2538 0077 9637 9177 7887 8217 7757 7677 4357 4457 2567 1247 2077 1177 0056 9996 8736 6916 7046 5796 4696 5176 3886 2566 1796 1506 0815 8245 9555 8175 8005 7645 4765 4425 6015 4865 4215 4345 2655 2395 1985 0445 0255 0114 9894 9844 9824 8424 7884 7984 6584 5114 6194 5054 3854 3104 2794 2134 1463 9993 9894 1544 0164 0013 9653 7473 8283 7333 7143 7283 6933 7923 7243 7543 4893 5883 4493 4083 4803 4953 3713 3593 2453 3623 3073 2413 2123 0463 0642 9893 0743 0173 1092 9622 8853 0002 8902 7822 8552 9012 8262 7152 7172 7082 5142 5792 6012 5422 4842 4642 4552 4692 4502 2962 3372 2642 3062 2922 2552 2912 2832 2312 2342 1762 0662 0612 0752 1202 0752 0042 0261 9532 1102 0341 9231 9321 9281 9041 8601 8461 8931 8991 7871 7711 6961 7101 7181 6551 6171 6841 6691 5921 5701 5841 5061 5711 5481 4771 5381 4531 4181 4031 3441 4111 3841 4331 3861 4131 3591 3291 3881 3301 3441 3501 3361 3081 3121 3201 1911 2341 1971 2421 2301 1291 1961 1321 1451 1521 1391 1771 0831 0671 0871 1291 0361 0611 0721 0641 0089989301 00594893993088988890187881789596187184183684286980678780980778775670677579176271478480775175770175771076471569866664663367066661665860961755963660258157358556053355053557954455254254751652053347750748649550549048346645446247944647846842347948942540148342439145743148339746144842740839837339040536734534136039535535433634231830335431035632133035931928834432531431931031033328630032629333031232434129727931932130733131827530831629524730528727528627727327429924427225427626825324424424224026722325522422226025124821425925426823621223023322422023920421721620719320320019845 309100200300400500600700800900>1000Coverage value1k10k100k1M10M100M# Bases

Base Quality

The base quality distribution shows the Phred quality scores describing the probability that a nucleotide has been incorrectly assigned; e.g. an error in the sequencing. Specifically, Q=-log10(P), where Q is the Phred score and P is the probability the nucleotide is wrong. The larger the score, the more confident we are in the base call. Depending on the sequencing technology, we can expect to see different distributions, but we expect to see a distribution skewed towards larger (more confident) scores; typically around 40.

00396 24300000000469 609 4860000000000000808 495 4570000000000020 135 006 92600000510152025303540Phred quality score0G2G4G6G8G10G12G14G16G18G20G# Bases

Mapped Reads

Number of reads successfully mapped (singletons & both mates) to the reference genome in the sample. Genetic variation, in particular structural variants, ensure that every sequenced sample is genetically different from the reference genome it was aligned to. Small differences against the reference are accepted, but, for more significant variation, the read can fail to be placed. Therefore, it is not expected that the mapped reads rate will hit 100%, but it is supposed to be high (usually >90%). Calculations are made taking into account the proportion of mapped reads against the total number of reads (mapped/mapped+unmapped).

100 %141 808 780100 %0 %

Both Mates Mapped

When working with paired-end sequencing, each DNA fragment is sequenced from both ends, creating two mates for each pair. This chart shows the fraction of reads in pairs where both of the mates successfully map to the reference genome. .

Notice that reads not mapped to the expected distance are also included as occurs with the proper pairs chart.

100 %141 806 296100 %0 %

Singletons

When working with paired-end sequencing, each DNA fragment is sequenced from both ends, creating two mates for each pair. If one mate in the pair successfully maps to the reference genome, but the other is unmapped, the mapped mate is a singleton. One way in which a singleton could occur would be if the sample has a large insertion compared with the reference genome; one mate can fall in sequence flanking the insertion and will be mapped, but the other falls in the inserted sequence and so cannot map to the reference genome. There are unlikely to many such structural variants in the sample, or sequencing errors that would cause a read not to be able to map. Consequently, the singleton rate is expected to be very low (<1%).

0 %2 4840 %100 %

Forward Strand

Fraction of reads mapped to the forward DNA strand. The general expectation is that the DNA library preparation step will generate DNA from the forward and reverse strands in equal amounts so after mapping the reads to the reference genome, approximately 50% of them will consequently map to the forward strand. Deviations from the 50%, may be due to problems with the library preparation step.

50 %70 905 65650 %50 %

Proper Pairs

A fragment consisting of two mates is called a proper pair if both mates map to the reference genome at the expected distance according to the reference genome. In particular, if the DNA library consists of fragments ~500 base pairs in length, and 100 base pair reads are sequenced from either end, the expectation would be that the two reads map to the reference genome separated by ~300 base pairs. If the sequenced sample contains large structural variants, e.g. a large insertion, where we expect the reads mapping with a large separation would be a signal for this variant, and the reads would not be considered as proper pairs. Based on the sequencing technology, there is also an expectation of the orientation of each read in the fragment.

The rate of proper pairs is expected to be well over 90%; even if the mapping rate itself is low as a result of bacterial contamination, for example.

99 %140 441 13299 %1 %

Duplicates

PCR duplicates are two (or more) reads that originate from the same DNA fragment. When sequencing data is analyzed, it is assumed that each observation (i.e. each read) is independent; an assumption that fails in the presence of duplicate reads. Typically, algorithms look for reads that map to the same genomic coordinate, and whose mates also map to identical genomic coordinates. It is important to note that as the sequencing depth increases, more reads are sampled from the DNA library, and consequently it is increasingly likely that duplicate reads will be sampled. As a result, the true duplicate rate is not independent of the depth, and they should both be considered when looking at the duplicate rate. Additionally, as the sequencing depth in increases, it is also increasingly likely that reads will map to the same location and be marked as duplicates, even when they are not. As such, as the sequencing depth approaches and surpasses the read length, the duplicate rate starts to become less indicative of problems.

29.3 %41 540 17129.3 %70.7 %

Mapping Quality Distribution

The mapping quality distribution shows the Phred quality scores describing the probability that a read does not map to the location that it has been assigned to (specifically, Q=-log10(P), where Q is the Phred score and P is the probability the read is in the wrong location). So the larger the score, the higher the quality of the mapping. Some scores have a specific meaning, e.g. a score of 0 means that the read could map equally to multiple places in the reference genome. The majority of reads should be well mapped, and so we expect to see this distribution heavily skewed to a significant value (typically around 60). It is not unusual to see some scores around zero. Reads originating from repetitive elements in the genome will plausibly map to multiple locations.

5 555 650124 39352 085169 74351 67247 274221 29964 24238 34683 20327 24721 02381 77227 61112 48744 07715 92016 36537 31423 05616 29844 66928 12225 42746 42890 4169 253435 1319 7439 22717 69516 43110 22036 8249 0398 82314 44717 7966 69144 342343 06924 07019 59541 03024 71371 78385 663151 576557 84434 15435 59722 56144 40611 22219 65617 79615 021115 95418 94848 845136 311 544051015202530354045505560Phred quality score10M20M30M40M50M60M70M80M90M100M110M120M130M# Reads

Mapped vs Unmapped

Stacked column chart for both mapped and unmapped reads along all chromosomes in the reference file. It is a similar representation as shown in the Mapped reads chart but for each chromosome. Although sequenced sample may be a female, it is possible to get reads in the Y chromosome as there are common regions in both chromosomes called pseudoautosomal regions (PAR1, PAR2).

Unmapped reads belonging to each chromosome are determined when the one mate/pair is aligned and the other is not. The unmapped read should have chromosome and POS identical to its mate. It could also be due when aligning is performed with bwa as it concatenates all the reference sequences together, so if a read hangs off of one reference onto another, it will be given the right chromosome and position, but it also be classified as unmapped.

100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%100%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%123456789101112131415161718192021XYM0%10%20%30%40%50%60%70%80%90%100%mappedunmapped