Mutation T@ster

statistics

models and training | cross-validation | cross-comparison with other tools | application to an exome

models and training

models

MutationTaster2 still uses 3 different models, one for alterations that do not cause any amino acid ( without_aee), one for simple substitutions ( simple_aae), and one for those changes that cause more complex changes in the aa sequence of the resulting peptide, such as a frameshift or a shifted start ATG ( complex_aae).
See the models we used (numbers of training cases and frequencies).

training sets

The following table shows the composition of the data sets used to train the classifier. We used all available alterations that fulfilled our criteria (status 'disease mutation' in HGMD Pro or polymorphism confirmed by at least 4 carriers of this genotype in the 1000 Genomes Project (TGP) - variants that appeared in both groups were discarded). For the web-version, the classifier was trained with all alterations suitable for the given model; alterations of the less frequent type were fed several times into the training to reach equal frequencies of disease mutations and polymorphisms. See below for the composition of the cross-validation data sets.

model	n (polymorphisms)	n (disease mutations)	comments
without_aae	6807269	122238	each disease mutation was used 55 times in the training of the web version (56 times in the cross-validation)
simple_aae	20967	151542	each polymorphism was used 7 times in the training of the web version and in the cross-validation
complex_aae	2340	123213	each polymorphism was used 52 times in the training of the web version (57 times in the cross-validation)

cross-validation

We cross-validated MutationTaster2 five times for each of the models. For these cross-validations, all but 4000 alterations suitable for the model were used to train the classifier. The disease potential of the remaining 4000 alteration (2000 disease mutations and 2000 polymorphisms) was then predicted by the classifier. As the number of known polymorphisms leading to complex changes ( complex_aae model) is still very low, we could only use 400 alterations falling into this class as test set. This explains the relatively high standard deviation.
The additional features used for the automatic classification of variants (such as presence in the TGP data with a reasonable number) were of course neglected, so the predictive performance for real data will be even better than the numbers shown here.

results of the cross-validation

model	n	accuracy	accuracy (disease mutations)	accuracy (polymorphisms)	sensitivity	specificity	NPV	PPV
simple_aae	4000	0.886	0.895	0.877	0.879	0.893	0.877	0.895
±		0.004	0.005	0.008	0.007	0.004	0.008	0.005
without_aae	4000	0.922	0.888	0.957	0.954	0.895	0.957	0.888
±		0.004	0.006	0.004	0.004	0.005	0.003	0.006
complex_aae	400	0.907	0.944	0.869	0.879	0.939	0.869	0.944
±		0.017	0.004	0.032	0.026	0.005	0.032	0.004

Below the values for each model, the standard deviation (obtained from 5 runs), is shown. See the results of each single run.

cross-comparison

We compared MutationTaster2 with PolyPhen-2 (HumVar and HumDiv model), SIFT, PROVEAN, and MutationTaster1. As SIFT and PolyPhen-2 can only handle single amino acid substitutions, we restricted the test set to suitable mutations. For this test, the web version of MutationTaster (which was trained with all available polymorphisms and disease mutations) was used. The additional features used for the automatic classification of variants (such as presence in the TGP data with a reasonable number) were of course neglected.
We generated two test sets, each containing 3600 alterations leading to single amino acids substitutions with a known pathogenicity status:

1	1800 known disease mutations from HGMD Pro (disease state = DM [disease mutation])
	1800 harmless polymorphisms from the 1000 genomes project (each of the 3 possible genotypes found in at least 50 samples)^*
2	1800 known disease mutations from ClinVar (disease state = pathogenic) [not with MutationTaster1]
	1800 harmless polymorphisms from the 1000 genomes project (each of the 3 possible genotypes found in at least 50 samples) [not with MutationTaster1]^*

^* We used the same polymorphisms in both sets.
Please note that MutationTaster1 was added later due to a request of our reviewers and is so far included only in the HGMD-based comparison. MutationTaster1 uses Ensembl59 and we could not always find the same transcripts that were used in the initial comparison. We hence had to reduce the number of test cases in the HGMD-based comparison!

We submitted these alterations to the web interfaces of PolyPhen-2 (HumVar and HumDiv), SIFT, PROVEAN, MutationTaster2, and MutationTaster1 by entering the DNA change via the chromosomal position. Since these variants (on DNA level) might cause different substitutions in different transcripts, we extracted the one result corresponding to the amino acid exchange from the testset. If there were several results for the amino acid exchange in question, we used the first result. Since some variants could not be analysed by all programs (or at least did not return the required amino acid substitution), we randomly selected 1300(ClinVar)/1100(HGMD) disease mutations and 1300(ClinVar)/1100(HGMD) polymorphisms out of the 2381 (HGMD) or 2814 (ClinVar) variants for which all programs gave predictions. We then compared the predictions for these 2600/2200 test cases.

results of the cross-comparison

programme	total	TP	TN	FP	FN	NPV	PPV	sensitivity	specificity	accuracy
1000 genomes and HGMD Pro
PPH2-var	2200	868	976	124	232	80.8%	87.5%	78.9%	88.7%	83.8%
PPH2-div	2200	944	903	197	156	85.3%	82.7%	85.8%	82.1%	84.0%
PROVEAN	2200	856	966	134	244	79.8%	86.5%	77.8%	87.8%	82.8%
SIFT	2200	910	944	156	190	83.2%	85.4%	82.7%	85.8%	84.3%
MT1	2200	931	961	139	169	85.0%	87.0%	84.6%	87.4%	86.0%
MutationTaster2	2200	976	961	139	124	88.6%	87.5%	88.7%	87.4%	88.0%
1000 genomes and ClinVar
PPH2-var	2600	1108	1159	141	192	85.8%	88.7%	85.2%	89.2%	87.2%
PPH2-div	2600	1175	1076	224	125	89.6%	84.0%	90.4%	82.8%	86.6%
PROVEAN	2600	1096	1146	154	204	84.9%	87.7%	84.3%	88.2%	86.2%
SIFT	2600	1136	1123	177	164	87.3%	86.5%	87.4%	86.4%	86.9%
MutationTaster2	2600	1213	1132	168	87	92.9%	87.8%	93.3%	87.1%	90.2%

tp: true positive; tn: true negative; fp: false positive, fn: false negative; NPV = negative prediction value = tn / (tn + fn); PPV = positive prediction value = tp / (tp + fp); sensitivity = tp / (tp + fn); specificity = tn / (tn + fp); accuracy = (tp + tn) / (tp + tn + fp + fn)

MutationTaster2 displays automatic predictions for known harmless polymorphisms from the 1000 Genomes Project and known disease mutations from NCBI ClinVar. For the comparison of MutationTaster2 with the other tools, we did not consider the automatically displayed prediction (which is per se correct - or at least the same as our prior) but the actual prediction made by the classifier, which is reflected by the probability-value. (An automatic prediction with a probability below 0.5 means that MutationTaster2 would predict the other case if it would not consider the known outcome of this variant.) The performance of MutationTaster on real life data, where known polymorphisms / disease mutations are recognised, is hence even better than the accuracy shown here. Please see our example using a real exome to evaluate the use of MutationTaster!

These results suggest a bias towards mutations with a more obvious effect on the protein in ClinVar (as of February 2013) because all programs perform better on the ClinVar data set. See the ClinVar set and the single results.
For copyright reasons, we are unable to reproduce the list of disease mutations obtained from HGMD Professional as text, but we can offer the predictions for the disease mutations and polymorphisms as images. We also provide the results as a comprehensive text file, but here the identities of the HGMD disease mutations are replaced by sequential numbers.
We also provide detailed statistics about the consensus among the different tools for the HGMD data set.

ROC plot

This plot shows the receiver operating characteristics for the comparison using the HGMD/TGP dataset. Please note that these plots are intended to set a threshold to discriminate signals from noise - or, in case of score-based predictiors, to find an optimal cut-off value between disease mutations and polymorphisms.
MutationTaster2 does not return a 'score' but only a boolean prediction (disease causing or not) plus a confidence score for this prediction. This kind of plot is hence not very useful to determine the performance of MutationTaster2 (it would indeed be very useful to determine cut-off values for continuous values, e.g. for predictions only based on PhyloP or PhastCons).
We know, however, that in many articles ROC curves are used to compare tools such as MutationTaster2 and hence include these curves to show that MutationTaster2 has, as its predecessor, a higher area under curve (AUC).

This plot was generated using R. We loaded the prediction (or confidence) scores of the different programs and the disease status for each alteration into the ROCR package. In the case of PROVEAN and SIFT, where a decreasing score indicates higher disease potential, we multiplied the scores by -1.

application to a real exome

To evaluate the performance of MutationTaster2, especially the false positive rate (FPR), we have sent all exonic variants found in a 1000 genotypes sample to MutationTaster2, PolyPhen-2, SIFT, and Provean. In the first step, the variants were extracted from the BAM alignment file of sample HG00377 using samtools/bcftools:

samtools mpileup -D -gf /Ensembl69/Homo_sapiens.GRCh37.69.dna.all.fa HG00377.mapped.ILLUMINA.bwa.FIN.exome.20121211.bam | bcftools view -c -g -v - > Exome_HG00377.vcf

A list of all variants within exons (+/- 10 flanking bases) was then obtained with a Perl script. This list was then sent to MutationTaster2's Query Engine and to the web services of PolyPhen-2, and SIFT/PROVEAN.
The results obtained from the different tools were written to a database table; in case of more than one prediction for a variant (due to mutiple transcripts), the most deleterious score was used. From this database table, we extracted all predictions for homozygous variants with a coverage of 10 or higher. Each table contains two parts; on top we list all predictions, below only those cases that were predicted by all tools.
Because SIFT and PolyPhen can only predict the outcome of single amino acid substitutions, we created another table that contains only these cases. For this, we extracted the predicted amino acid substitutions from MT2 and SIFT and included only those cases that were predicted to cause such an exchange by both tools. All predictions that assumed pathogenecity (including "possibly damaging” and “probably damaging”) were counted as false positives.
MutationTaster2 does not only give more predictions than the other tools (because it can also handle synonymous substitutions and the flanking bases outside the exons) but even fewer false positives than our competitors. Please note that in this real life example, MutationTaster's automatic classification routines were used.

Non-synonymous, synonymous, and non-coding variants

case	MutationTaster2	PPH	SIFT	PROVEAN
all predictions
FP	103	464	353	462
TN	7714	943	6623	6436
FPR	1.3%	33.0%	5.1%	6.7%
variants analysed by all tools
FP	7	401	286	308
TN	1224	830	945	923
FPR	0.6%	32.6%	23.2%	25.0%

only variants leading to single amino acid substitutions

case	MutationTaster2	PPH	SIFT	PROVEAN
all predictions
FP	6	376	295	331
TN	2771	776	2482	2446
FPR	0.2%	32.6%	10.6%	11.9%
variants analysed by all tools
FP	6	376	274	290
TN	1146	776	878	862
FPR	0.5%	32.6%	23.8%	25.2%

FP: false positives (i.e. pathogenic predictions), TN: true negatives (benign predictions), FPR: false positive rate ( FP/(FP+TP) )

data

MutationTaster's results can be interactively inspected: vcf_7890_1559988265/progress.html.
Note that MutationTaster2 gives predictions for all transcripts, inflating the number of results. Please also note that many of the disease predictions are frameshifts which are neglected by the other tools; some are even known disease mutations.

Downloads:
statistics for homozygous / all variants with a coverage of at least 10
the sample exome (HG00377)
PolyPhen-2 results (Settings: HumDiv model, all transcripts)
SIFT/PROVEAN results