What is BRCA1 predictor?

BRCA1 predictor is a tool to predict the functional significance of nonsynonymous variants of BRCA1 from the Clinical and Translational Bioinformatics research group at Vall d'Hebron Institute of Research.

How can I predict my variant?

You can submit your variant in our query page indicating the native amino acid, the residue and the mutated amino acid. Afterwards, you will be redirect to the prediction page.

Why is my variant not accepted?

We use as a reference the database UniProt, a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. In particular, we use the most prevalent isoform, the canonical isoform. So, if you are using another isoform or another database for protein sequence reference such as NCBI or Ensembl, you can find some small diferences.

Which metrics has a prediction of pathogenicity?

We provide you with three metrics:

  • Label: the variants are classified as pathogenic or neutral according to its functional consequence.

  • Score: the numerical score of the functional consequence of the variant. It has a continuous scale from 0 to 1, being 0 a neutral and 1 a pathogenic variant. The threshold between pathogenic and neutral variant is at 0.5.

  • Reliability: measures the accuracy of the prediction. It has a continuous scale from 0 to 1, being 1 a trueful prediction.

Score Plot

Which metrics has a prediction of HDR experiment?

We provide you with two metrics:

  • Label: the variants are classified as HDR activity or No HDR activity according to its functional consequence.

  • Score: the numerical score of the functional consequence of the variant its calculated based on the paper of Starita et al.

How are these predictions of pathogenicity calculated?

These predictions are calculated by a machine learning algorithm previously trained with a set of already known variants. To develop the predictor, we followed these steps:

  1. Collect the pathogenic and neutral variants of the protein

  2. Search the features able to discriminate between pathogenic and neutral variants

  3. Build the model by training the neural network algorithm with a set of features of known variants

  4. Estimate the model performance by cross-validation to ensure reliable results

Predictor Schema

Riera et alt., Human Mutation, 2016

How are these predictions of HDR experiment calculated?

These predictions are calculated by a multiple linear regression model previously trained with a set of already known variants. To develop the predictor, we followed these steps:

  1. Collect the pathogenic and neutral variants of BRCA1 and their HDR experimental value from the paper of Starita et al.

  2. Search the features able to discriminate between pathogenic and neutral variants

  3. Train the predictive model of multiple linear regression with a set of features of known variants

  4. Estimate the model performance by cross-validation to ensure reliable results

Which is the performance of BRCA1 predictor?

The BRCA1 predictor have been evaluated by leave-one-out cross valiation and compared to the state of the art predictors. Compare the performance metrics per predictor:

Sensitivity Specificity Accuracy MCC Coverage
BRASS NN 0.857 0.718 0.765 0.546 100 %
BRASS MLR 0.667 0.875 0.786 0.559 100 %
Align-GVGD 0.933 0.831 0.898 0.772 100 %
PON-P2 1.0 0.259 0.756 0.436 14 %
PolyPhen-2 0.517 0.844 0.628 0.35 100 %
CADD 1.0 0.358 0.492 0.323 24 %
SIFT 0.0 0.0 0.0 0.0 0 %

Can I download all the predictions for my protein?

You can download all the pre-calculated predictions to make your own queries. The file is in csv format containing the following columns:

# Field Description
1 Gene HGNC official gene symbol
2 Protein Uniprot accession number
3 Variant Nonsynonymous variant from the canonical isoform
4 Prediction Predicted functional consequence of the variant
5 Score Numerical score of the pathogenic prediction

Which other relevant information do you provide?

The results report a great amount of information related to the variant divided in different sections:

  • Prediction: prediction of the functional consequence of the variant along with its score and reliability.

  • Other Predictors: functional consequence of the variant predicted by other standard tools such as PON-P2, PolyPhen-2, SIFT, Align-GVGD and CADD predictors.

  • Variant Annotation: known the clinical evidence, biological relevance, population allele frequency and other information about your vairant from several databases such as ClinVar, UniProt, dbSNP and ExAC.

  • Biomedical Information: links to several resources about the disease (DECIPHER, HPO, GeneReview, Malacards, MedGen, OMIM and Orphanet databases), the protein (UniProt database), the tridimensional structure (PDB database), the protein-protein interactions (STRING database), the metabolic pathways (REACTOME database), and the gene (Ensembl, GeneCards, HGNC and NCBI databases).

  • Protein Plot: distribution of several features along the protein such as known pathogenic and neutral variants, biological relevant residues, functional domains and gene exons.

  • Predicted functional consequence: localization of the score of the variant in the distribution of scores of known pathogenic and neutral variants.

  • Explanatory variables of the prediction: localization of the features of the variant in the distribution of features of known pathogenic and neutral variants.