Yum, tasty mutations...

Mutation T@ster

PhyloP and PhastCons with real data

To evaluate the significance of PhyloP and PhastCons scores, we extracted both for a set of known polymorphisms (from the TGP) and disease mutations (from HGMD Pro).
The outcome is plotted below - with these preliminary results and following further tests, we decided to include both conservation scores in the classification model as normally distributed variables. This is, of course, not quite true but gave the best fit in the cross-validation. Using only the PhastCons scores gave a slightly decreased performance than the combination of both scores. Other tested alternatives were different classes for each conservation score method, hence the 20 bins plotted on the X axis.

In the final model used by the classifier, we include the two flanking positions as well (one on each side of the variant). We hence have 4 attributes:
phylop_flanking, phastcons_flanking, phylop_direct, phastcons_direct
For each attribute, we calculate the maximum value and submit this to the classifier.

PhastCons scores for single base exchanges

PhyloP scores for single base exchanges

PhastCons scores for InDels

PhyloP scores for InDels