European Genome-Phenome Archive

File Quality

File InformationEGAF00001404480

File Data

Site Frequency Distribution

Also called allele frequency spectrum, refers to the distribution of the allele frequencies of a set of loci (typically SNPs) in a population or sample. It is a histogram which depends on the number of samples used for its calculation and therefore, it is significant if a high number of samples are used. Sometimes, variant callers use human databases of variants as a control to get an idea about the expected frequencies. A good plot should be similar to an exponential distribution like the one shown below:

SFS
10Percentage of Alleles (only using SNPs) in the called population that exhibit the alterate allele1log10(frequency)

Variant types

This plot shows which kind of variant types are present in the VCF file. SNPs are the most common variant expected to be detected. The proportion among the different types (Ins for insertions and Del for deletions) may vary according to the kind of experiment performed, being more variant types in whole sequencing data than in exome data as big changes may be more deleterious. Other variant types may comprise when a combination of SNP and deletion occurs (i.e. Ref: ACGT → CCG).

The purpose of this plot is just informative as it allows to check if the VCF has been filtered and to detect which variants are not present in a first view.

39 509 356896 0861 643 450SNPsInsDel0M5M10M15M20M25M30M35M# Variants

Ts/Tv: 2.2

Transversion is defined as a point mutation of the DNA sequence that makes a change of purine (Adenine & Guanine) to pyrimidine (Cytosine & Thymine) or vice versa. Pyrimidines are characterized in having a double ring while purines do have one.

Transition occurs when the mutation is between the same kind of bases (purine to purine or pyrimidine to pyrimidine). Transitions are less likely to turn in amino acid substitutions and may persist as “silent substitutions” as single nucleotide polymorphisms (SNPs).

Transitions occur approximately 10 times more often than transversions as changing a single structure is more likely than changing a double ring to a single ring. Also, it is important to notice that transversions usually have more effect than a transition as the third position of the amino acid code is more tolerant to transitions than transversions, i.e. transition may derive in the same amino acid.

So, finally for Human DNA is expected to obtain a Ts/Tv ratio close to 2. If it is lower, it means a probable signal of data problems.

27 171 49812 337 887TsTv0M2M4M6M8M10M12M14M16M18M20M22M24M26M# Variants

Base Changes

As explained in the Ts/Tv Ratio info, transitions are 10 times more probable than transversions. That means it is more likely the Adenine to mutate into Guanine or vice versa than to mutate either a Cytosine (C) or Thymine (T). The plot should be similar to the following one:

SFS

A deviation of this model may indicate a problem with the variant calling, and probably the Ts/Tv ratio will be lower than 2.

1385873C5481710G1314375TA1802237A1675928G8093799TC8125828A1667957C1793591TG1313608A5470161C1384318GT01,000,0002,000,0003,000,0004,000,0005,000,0006,000,0007,000,0008,000,000

Indel Distribution

This plot allows checking the number of insertions and deletions according to length observed in the VCF. Negative values represent deletions and positive values correspond to insertions. The x-axis shows the number of bases inserted/deleted and the y-axis corresponds to the number of variants observed having those amount of bases inserted/deleted. Usually, more deletions than insertions are observed, and the length of the indels are generally very short (1-5 bases).

5 705112235291017182623203956657576100911121352302403062644485207547861 1141 0071 3801 4291 9971 9602 8032 7264 2024 1505 6685 7109 4688 15911 82111 02316 77216 19732 45269 728219 535183 348302 906717 754582 109126 50653 85064 63422 14811 9656 6197 8364 4423 3252 1032 1951 3591 3281 0451 01977264049342528619613914391915260323538262217756145851111-60-50-40-30-20-1001020304050Indel Length Distribution100k200k300k400k500k600k700k# Found

Quality Distribution

Represents the phred-scaled quality score for the assertion made in ALT. It corresponds to the QUAL (6th column) in the vcf file. Higher it is the more confident we are. Notice that this value can be not only platform biased but also variant calling tools may get different values so you should be aware of that. Here we mainly want to verify that the quality scores are not crowded into the low (< 100) end. Deviations may indicate problems within the analysis/experiment.

127131331221131251055423222511241342362113223112222113121411423326434212424314111121242241113131334314241313113231111371314321113223111124222122242213144523313514424764882412851312161213425421315353343241132161231125452124414444263325525634322214121112243512222312233733146134252241511322492332565577358352131121112181221211132422411313142115132223224134432432244442322612111111316222312522123362221213225113142111132740243434125226342454121231112212234112223131215251122114124131152312842123472133816973523111213121211611113221132253211241212311124131521335213122421423237155111111222322142511121233114212232111121552121371911322242213614113811111114211121127114122632122122143412211331412321434323102332134232112535611131321113141222411312122124123312112111223332322341217325341132215233633222111223111423113133111111423223122113151333161124274212415443165333273111121113112222121232221121152112212111322242144112451123422225526411123121111321152212121111213114143452225214324314222131432122235443323213681121211111111121112122221122321122113111122434133341245421361455111623214413593121113211111121322112321211121122215215231151422111111121112231342113198212121311113111113211121221113112131341111223443331432221512113321510111131112121212111511322212223125113115324125431211132311136122216122620111121111141211111112311232212213211212352514123121114224212276312154333121121114211112141321312322111211124421144221133244115311522566397444928121211122111111121421111211311711111211111222113133124213334263522565212123111111111222211312141222421113211222311121315143221224115335255131121111111124141112111112211312141233132131121232212311111113111163234321073112212111224112211112311112413321312121131411111311143312131141235655111111122113221112342212121332112111313212222113231115355122212112531231311121111211121121421221233112324122111111411141211322133322323112113213121111212111111112121122122231121141211112121132112242144123818634521112441112213121121111113214111222111213411111111127351145151514141111321112212142252211111121113142211211171523223121231113124332141151543444111411131211132112111231222212123123114131232324251132351212441951724772121211412133213122212111321222121133112122112232212225693134111311233122312431112111121531131312111112111222813221232532257153131162339226347454512212235524113112121211341112122211112422123312111111141231232412211122233556332317335722112411112321221111211211331111233112212122133213341224323136514233212524437154211131311214111211162131111111112221333113121432212112142452322211154648711411124421311412111112222142223113113224242421173331921512512111134111112311311143212213151111121111111131141311321533113537312933235102414211413111111111222112213121122332414311232241144372642232543493114113211642221121211111122113222122111113512314223334233543463117222441211251113113111211221111221411211212211411243231632333242141242451662346913111112313121111211121213214211423112325215233343364113961334103113313111141231222221121111212113114223411322234214152522213223157232351321511211122121212211121112111211141233111236235221434215415311361442561011131411113211211221211222111121112421331243432345182222312425447331024448811321121221211111213112211123311211212466211211414163341241234233112565211812341111212111221211111111221223211123313233123321114243135323743444487112221221112221211122142222112522113213341112321361711532322421335542245912211111131111113111124111212212141533282334211324313221134341231552271383142116211111211113113131232213423131163141413518019020221118719919519323022623820719521922422423021422124521525522627626024223523125126225324134530129929330927436229532230935934134334735739034238833436837934034233936634331741440035632033334834734033431934231634532435734332832531637634229232329932435136535535133333037934633535032133033833135735435230328030634033833533032429631231335032129128730334232632532733230431630027728726127924625023822220925921024221520321321622323124032927523521623623325421823021021119721421920821018718216120519822621224222226029816 191807857757018 09020406080100120140160180200220Variant Quality Distribution1210201002001k2k10k# Variants