|Bayes classifier||QueryEngine||known bugs and limitations||contact|
|examples & statistics|
|examples & tutorial||statistics - cross-validation & comparison with similar tools|
GeneYou can identify your gene of interest by entering one of the following:
TranscriptYou can also directly enter the Ensembl transcript id (starting with ENST, e.g. ENST00000308868) of your gene of interest.
Position / snippet refers toChoose coding sequence if you are working with coding sequence positions / sequence for localising the alteration of interest. Coding sequence (CDS) position 1 refers to the A of the start ATG (and is sometimes also called ORF, for open reading frame).
Alteration, all types by sequenceChoose all types by sequence if you have a sequence snippet around an alteration that you want to analyse. You can paste this sequence snippet into this field, putting square brackets [ ] around the altered base and the new base (e.g. ACGGTT[A/G]CTCTAAGGA for a base exchange from A to G). Comprehensive examples of the format are provided directly on the input mask. Additionally, you have to (1) indicate the HGNC symbol of the gene in question and (2) the transcript ID (or select one after entering a gene). All entries have to refer to the 5'-3' direction of the transcript sequence.
Alteration, single base exchange by positionChoose single base exchange by position if you are working with a single base exchange. This means, that only one single base is altered. If you have named the mutation according to the HGVS variation nomenclature there should be indicated whether you have to work in the coding sequence (CDS) or gDNA mode.
Alteration, insertion or deletion by positionChoose insertion / deletion by position if you are working with an insertion, a deletion or a combination thereof. You do not need to further specify which kind of alteration you are exactly dealing with, since this is automatically determined by the software (and displayed in the output).
optionsCheck 'show nucleotide alignment' if you want to see multi-species alignment of nucleotide sequence around the submitted alteration in the results. By default, nucleotide alignment is not run, since the BLAST call slows down MutationTaster and the results are not used by the Bayes Classifier anyway.
Name of alterationYou can enter a self-chosen name for the alteration in question. This will be displayed in the output in order to facilitate the identification of printed outputs for different mutations in the same gene.
MutationTaster employs a Bayes classifier to eventually predict the disease potential of an alteration. The Bayes classifier is fed with
the outcome of all tests and the features of the alterations and calculates probabilities for the alteration to be either
a disease mutation or a harmless polymorphism. For this prediction, the frequencies of all single features for known
disease mutations/polymorphisms were studied in a large training set composed of >390,000 known disease mutations from HGMD Professional
and >6,800,000 harmless SNPs and Indel polymorphisms from the 1000 Genomes Project (TGP).
ModelsWe provide three different models aimed at different types of alterations, either aimed at 'silent' (non-synonymous or intronic) alterations (without_aae model), at those leading to the substitution/insertion/deletion of a single amino acid (simple_aae model) or at more complex changes of the amino acid sequence (e.g. mutations introducing a premature stop codon, etc - complex_aae model). All models were trained with all available and suitable common polymorphisms and disease mutations. MutationTaster automatically determines the correct model for each alteration.
Output: probability value
The probability value is the probability of the prediction, i.e. a value close to 1 indicates a high 'security' of the prediction. Please note that the p value
used here is NOT the probability of error as used in t-test statistics.
Our results show that wrong predictions are usually not reflected by low probability values but are rather caused by polymorphisms or disease causing alterations that show characteristics of the other case, e.g. SNPs that are highly conserved and destroy protein features or disease mutations that appear to have no effect on the protein/gene at all.
If an alteration is a 'true' SNP (as confirmed by the existence of each of the three genotypes AA, AB, BB in the HapMap data or by presence in TGP in homozygous state in > 4 cases), it is automatically predicted to be a polymorphism. Alterations causing a premature termination codon and ultimately leading to nonsense-mediated mRNA deday (NMD) are automatically assigned the 'disease causing' status. In both cases, the Bayes classifier is run nevertheless and the probability for the prediction that was automatically made is shown. Scores below 0.5 hence indicate, that our classifier comes to a different conclusion. A few SNPs listed in HapMap introduce premature stop codons and will cause NMD; these are likely to be mistaken for disease mutations.
We advise you not to exclude an alteration due to a dbSNP ID. Many SNPs from dbSNP are not validated and some are even known to be disease causing variants (e.g. rs28939070 is responsible for Trichorhinophalangeal Syndrome, type I).
Since we used 'true' SNPs from the 1000 Genomes Project as our polymorphism data set, we did not include Genotype data from the 1000 Genomes Project (and HapMap frequencies either) in the training and optimisation of MutationTaster nor in the comparison with other applications.
The Bayes classifier is regularly updated, i.e. predictions might in some cases change over time.
summaryList of the most prominent features of the analysed alteration (e.g. 'at intron-exon boundary', 'spans start ATG', 'homozygous in TGP' etc.)
name of alterationA user-specified name in order to identify printed outputs.
alteration (phys. location)The alteration on "physical" i.e. chromosomal level (e.g. chr7:91623937_91623938insGGCAAT).
HGNC symbolThe official HGNC symbol.
Ensembl transcript IDEnsembl  transcript ID, starting with ENST.
UniProt peptide (SwissProt ID)UniProt KB / SwissProt  accession ID. Unfortunately, this does not always correctly correspond to the selected product of the transcript.
alteration typeIs either a base exchange, a combination of insertion and deletion, an insertion or a deletion.
alteration regionIs either 5'UTR (untranslated region), CDS (coding sequence), 3'UTR or intron.
DNA changesAlteration on nucleotide level. gDNA level (g.) is displayed always, cDNA level (cDNA.) for alterations located in exons, CDS level (c.) only for alterations residing in an exon in the coding sequence.
AA changesAny amino acid changes are shown here, displaying the original versus the new amino acid as well as the position of the substitution and a score for it. This score is taken from an amino acid substitution matrix (Grantham Matrix ) which takes into account the physico-chemical characteristics of amino acids and scores substitutions according to the degree of difference between the original and the new amino acid. Scores may range from 0.0 to 215. Since the Grantham matrix does not provide values for an amino acid insertion/deletion, no score is given in such cases. The score is only displayed for information reasons and does not influence the MutationTaster prediction as generated by our Bayes classifier. An asterisk (*) stands for a stop codon, a minus (-) means that in the original AA sequence, there was no AA at this position. If the initial Methionine codon (startATG) is lost, MutationTaster searches for a potential new, downstream startATG and informs you about AA changes based on the assumed alternative AA sequence.
position(s) of altered AALists the positions of altered AA. For mutations resulting in a frameshift, the position of the first altered AA is displayed along with the information that due to a frameshift, there are further changes downstream.
frameshiftCan be either yes or no.
dbSNP / TGP / ClinVar / HGMDAny known polymorphism(s) or known disease variant that have been found at the position in question. Our database contains all single nucleotide polymorphisms (SNPs) from the NCBI SNP database (dbSNP). Moreover, we have stored all HapMap genotype frequencies as well as variants from the 1000 Genomes Project  (abbreviated here as TGP). If an alteration is located at the same position as a known dbSNP, MutationTaster provides the SNP ID (or rs ID) and a link together with the HapMap genotype frequencies, if available. If every of the three possible geno-types is observed in at least one HapMap population, the alteration is automatically regarded as a polymorphism (the naive Bayes classifier is run nevertheless and the p value for the prediction is shown). Please note that there may be differences between your alteration and the alleles in dbSNP. For the 1000 Genomes Project, MutationTaster provides information in either of the following formats:
regulatory featuresOur database contains so-called regulatory features from the Ensembl Regulation database, such as histone modification sites, open chromatin or transcription factor binding sites. For more information about Ensembl Regulation, please see their documentation. Since it is not yet clear if and how the regulatory features influence the gene under scrutiny or rather up- / downstream genes, the regulatory features are not used by the Bayes classifier for prediction, but only displayed for informational reasons here.
phyloP / phastConsphastCons and phyloP are both methods to determine the grade of conservation of a given nucleotide . MutationTaster uses values which are precomputed and offered by UCSC. phastCons values vary between 0 and 1 and reflect the probability that each nucleotide belongs to a conserved element, based on the multiple alignment of genome sequences of 46 different species (the closer the value is to 1, the more probable the nucleotide is conserved). It considers not just each individual alignment column, but also its flanking columns. By contrast, phyloP (values between -14 and +6) separately measures conservation at individual columns, ignoring the effects of their neighbors. Moreover, phyloP can not only measure conservation (slower evolution than expected under neutral drift) but also acceleration (faster than expected). Sites predicted to be conserved are assigned positive scores, while sites predicted to be fast-evolving are assigned negative scores. For more information about phyloP and phastCons, please see the cited paper or the description on the UCSC website.
splice sitesMutationTaster uses a locally installed third party splice site prediction program, namely NNSplice  from the Berkeley Drosophila Genome Project (a web-based version is available at http://fruitfly.org/seq_tools/splice.html) to analyse possible changes in splice sites.
Kozak consensus sequence alteredThe Kozak consensus sequence (gccRccAUGG; R = purine) starts upstream of the start codon (AUG) and plays a major role in the initiation of translation. The purine (R) at position -3 as well as the G in position +4 are highly conserved. The program checks whether for a given alteration a previously strong consensus sequence has been weakened.
conservation on AA levelFor conservation analysis, amino acid or nucleotide sequence homologues of ten other species (chimp, rhesus macaque, mouse, cat, chicken, claw frog, pufferfish, zebrafish, fruitfly, and worm) are aligned with the corresponding human sequence of the gene in question. Sequences are aligned with blastp , which is installed as stand-alone executable on our server, and analysed by MutationTaster.
protein featuresThe program checks whether any protein features are directly or indirectly affected by the alteration. Our database stores all human SwissProt protein features. Some features will not have an influence on the prediction; they are only displayed for information and should not have an impact on the disease-causing potential of the alteration (e.g. CONFLICT or MUTAGEN).
length of proteinMutationTaster checks if the resulting protein will be elongated (prolonged), truncated, or whether nonsense-mediated mRNA decay (NMD) is likely to occur. MutationTaster determines the NMD border as last intron/exon junction minus 50 bp and analyses if a given premature termination codon occurs 5' to this border thus leading to NMD. An elongated protein is referred to as prolonged, i.e. the original termination codon is destroyed and the translation stops later than normal. Truncated is reffered to as either slightly truncated (if less than 10% of the wild-type protein length are missing) or strongly truncated (if more than 10% of original protein length are missing). In the two latter cases, the additional information 'might cause NMD' is given, because the '-55 boundary rule' is not fulfilled, but it cannot be ruled out that NMD occurs nevertheless. If MutationTaster concludes that an alteration causes NMD, this alteration is automatically regarded as a disease mutation. The classifier is run never-theless and the p value for the prediction is shown.
AA sequence alteredCan be either yes (AA exchange) or no (no AA exchange)
position(s) of altered AAIf the alteration in question is located in the CDS, the position on amino acid level is shown here. If the alteration spans two or more amino acids, these are all displayed and separated by a comma.
position of stopcodon in wt / mu CDSPosition of the last base of the stop codon (this can either be TGA, TAA or TAG), position 1 refers to the A in the start ATG codon.
position (AA) of stopcodon in wt / mu AA sequencePosition of the stop asterisk (*) in the amino acid sequence, position 1 refers to the first amino acid of the protein.
poly(A) signalMutationTaster uses a locally installed version of the program polyadq  for analysis of polyadenylation signals. More information at http://rulai.cshl.org/tools/polyadq/polyadq_form.html
conservation on nucleotide levelConservation on nucleotide level is analysed similarly to AA level: Using bl2seq, homologue DNA sequences of different species are compared to the human DNA sequence. Conservation status can either be all identical (same base(s) in human and species sequence), not conserved (different base(s) in human and species sequence) or no alignment (if no local alignment around the indicated position(s) was found). If no homologue sequences are found, this is indicated by no homologue. Up to now, conservation on nucleotide level is not used for the prediction.
position of start ATG in wt / mu cDNAPosition of the A in the start ATG, position 1 refers to the first base of the cDNA. If the regular start ATG is changed by an alteration, MutationTaster searches for the next most 5'-ATG and assumes this to be the new start ATG for the mutated sequence.
position of termination codon in wt / mu cDNAPosition of the last base pair of the termination codon (this can be either TGA, TAA or TAG), position 1 refers to the first base pair of the cDNA.
chromosomeThe chromosome the alteration is located on.
strandIs either 1 for forward strand or -1 for reverse strand.
last intron/exon borderThe last base of the exon before the last exon.
theoretical NMD border in CDSIn order to avoid truncated proteins which might act in a dominant-negative manner, the eukaryotic cell has a surveillance mechanism to ensure that only error-free mRNAs are translated. It was shown that mRNA shorter than a given length is nearly completely degraded. This process is known as nonsense-mediated mRNA decay or NMD. The rule seems to be that a termination codon occurring 50-55 nucleotides upstream of the final intron / exon junction initiates the NMD machinery and the mRNA gets degraded. Therefore, this program determines the NMD border as last intron / exon junction minus 50 bp and analyses if a given premature termination codon occurs 5' to this border thus eventually leading to NMD.
length of CDSThe length of the coding sequence from the A of the initiation codon (ATG) to the last base of the termination codon.
cDNA positionGives the last wild-type base before alteration and first wild-type base after alteration in coding DNA sequence context (positions relative to start of transcribed coding DNA reference sequence) e.g. 1203 / 1205, the altered base is at position 1204.
gDNA positionGives the last wild-type base pair before alteration and first wild-type base pair after alteration in genomic DNA sequence context (positions relative to start of genomic DNA reference sequence) e.g. 53,344 / 53,346, the altered base is at position 53,345.
chromosomal positionGives the last wild-type base before alteration and first wild-type base after alteration in chromosomal sequence context (position relative to start of chromosomal reference sequence) e.g. 154,372,337 / 154,372,339, the altered base is at position 154,372,338.
gDNA and cDNA sequence snippetThe sequence surrounding the alteration (20 bp up- and downstream). The altered bases are highlighted in blue.
wild-type and mutated AA sequenceComplete AA sequences, the asterisk (*) indicates STOP.
speedThis is the time MutationTaster needed for analysis & prediction - your browser might need some extra time to display the results, especially if you include images.
InsDel too longAt present, MutationTaster handles only InsDels up to 12 bases.
Your mutation of interest seems to span an exon/intron boundary.This kind of mutation can only be analysed in gDNA mode.
No transcripts for this gene found!You might have mis-spelled the gene symbol or used a protein name which is not always also the correct symbol (e.g. protein p53 is gene TP53). Also, in some (rare) cases a NCBI gene could not be mapped to an Ensembl gene. As some external data is based on NCBI while other is based on Ensembl, MT needs both to make a prediction. Moreover, we filter out protein-coding transcripts (Ensembl biotype protein_coding) without a correct start codon (ATG) and correct stop codon (TGA, TAA, TAG). This might lead to the phenomenon that MutationTaster complains about "no suitable transcripts" or "no transcripts for this gene found" although Ensembl lists one or several. Transcripts of mitochondrial genes are not tested for integrity due to differences in the mitochondrial genetic code.
No internal Ensembl transcript ID found. / No Ensembl gene ID found for transcript. / No stable ID for this gene.Our database doesn't know the transcript you specified. This might happen if you refer to a newer or older release than the one we use. The release MT uses is mentioned on the query interface.
Ensembl gene XXX not found in ENSEMBLOur database doesn't know the gene you specified. This might happen if you refer to a newer or older release than the one we use. The release MT uses is mentioned on the query interface.
No NCBI gene ID found. / No NCBI gene ID found for this transcript.In some (rare) cases an Ensembl gene could not be mapped to a NCBI gene. As some external data is based on NCBI while other is based on Ensembl, MT needs both to make a prediction.
Too many NCBI gene IDs found.In some (rare) cases an Ensembl gene could not be mapped to a single NCBI gene. As some external data is based on NCBI while other is based on Ensembl, MT needs both to make a prediction.
Only invalid NCBI gene IDs found.In some (very rare) cases an Ensembl gene could not be mapped to a valid NCBI gene, i.e. the NCBI gene Ensembl refers to is 'discontinued' and was replaced by another gene. As some external data is based on NCBI while other is based on Ensembl, MT needs both to make a prediction. Please contact us if you encouter such a case.
Gene XXX not found on any chromosome.The gene under scrutiny has no valid positional data. This should not occur at all. Please contact us if you encouter such a case.
Gene XXX (Entrez gene YYY) and transcript ZZZ do not match!The transcript you entered is not a product of the gene you entered. Please check your input.
Position is out of gene!You entered a position that is located outside the gene. This may happen when you mapped genomic position to gene-specific position using an old genome build. Or, of course, by typos. Please check your input.
Could not retrieve a sequence or sequence is too short.MT was not able to get the gene sequence from Ensembl. This might be due to network problems so you should repeat the analysis after some time. Should this not work, please contact us.
No start ATG exon found.The transcript is not properly annotated: there is no start position of the coding sequence in the database. Please select another transcript of the same gene.
No stop exon found.The transcript is not properly annotated: there is no stop position of the coding sequence in the database. Please select another transcript of the same gene.
Chosen transcript ENSTXXX has no correct start ATG annotated.Protein-coding transcripts (Ensembl biotype protein_coding) are tested for transcript integrity, i.e. for presence of a correct start codon (ATG) and correct stop codon (TGA, TAA, ATG). If one is missing, an error message is thrown out because analysis in corrupt transcripts might lead to a wrong prediction.
Sequence XXX is not unique in your gene!Please use a longer snippet.
Sequence was not found in your gene.Please check your input: is there a typo in your snippet? Or do you use a snippet created from the wrong strand? MT always refers to the strand the gene is located on.
Snippet not properly formatted.Please check your input: snippets must be specified as ACGTACGT[OLDBASES/NEWBASES]ACGTACGT.
2013: SIFT, PROVEAN, PolyPhen-2