Mutation T@ster

real life data

To evaluate the performance of MutationTaster2, especially the false positive rate (FPR), we have sent all exonic variants found in a 1000 genotypes sample to MutationTaster, PolyPhen-2, SIFT, and Provean. In the first step, the variants were extracted from the BAM alignment file of sample HG00377 using samtools/bcftools:

samtools mpileup -D -gf /Ensembl69/Homo_sapiens.GRCh37.69.dna.all.fa HG00377.mapped.ILLUMINA.bwa.FIN.exome.20121211.bam | bcftools view -c -g -v - > Exome_HG00377.vcf

A list of all variants within exons (+/- 10 flanking bases) was then obtained with a Perl script. This list was then sent to MutationTaster2's Query Engine and to the web services of PolyPhen-2, and SIFT/PROVEAN.
The results obtained from the different tools were written to a database table; in case of more than one prediction for a variant (due to mutiple transcripts), the most deleterious score was used. From this database table, we extracted all predictions for variants with a coverage of 10 or higher. We divided the data into all variants and only homozygous variants (the sample has some variants leading to frameshifts and even harbours known disease mutations). Each table contains two parts; on top we list all predictions, below only those cases that were predicted by all tools.
All predictions that assumed pathogenecity (including "possibly damaging” and “probably damaging”) were counted as false positives (although this is certainly too strict as the disease mutations show). Because PROVEAN regards all synonymous exchanges as benign without any test, these cases were not counted as true negative predictions.
MutationTaster2 does not only give more predictions than the other tools (because it can also handle synonymous substitutions and the flanking bases outside the exons) but in most cases always even fewer false positives than the competitors. Please note that in this real life example, MutationTaster's automatic classification routines were used.

Results for all variants

case	MT	PPH	SIFT	PROVEAN
all predictions
FP	1825	2379	1692	2246
TN	19156	3065	7415	6733
FPR	8.7%	43.7%	18.6%	25.0%
variants analysed by all tools
FP	777	2120	1488	1845
TN	4020	2677	3309	2952
FPR	16.2%	44.2%	31.0%	38.5%

Results for homozygous variants

case	MT	PPH	SIFT	PROVEAN
all predictions
FP	103	464	338	426
TN	7714	943	3006	2875
FPR	1.3%	33.0%	10.1%	12.9%
variants analysed by all tools
FP	7	400	285	305
TN	1219	826	941	921
FPR	0.6%	32.6%	23.2%	24.9%

FP: false positives (i.e. pathogenic predictions), TN: true negatives (benign predictions), FPR: false positive rate ( FP/(FP+TP) )

data

MutationTaster's results can be interactively inspected: vcf_7890_1559988265/progress.html.
Note that MutationTaster gives predictions for all transcripts, inflating the number of results. Please also note that many of the disease predictions are frameshifts which are neglected by the other tools; some are even known disease mutations.

our sample exome
PolyPhen-2 results (Settings: HumDiv model, all transcripts)
SIFT/PROVEAN results