Class for the enzymatic digestion of proteins. More...
#include <OpenMS/CHEMISTRY/EnzymaticDigestion.h>
Classes | |
struct | BindingSite |
struct | CleavageModel |
Public Types | |
enum | Enzyme { ENZYME_TRYPSIN, SIZE_OF_ENZYMES } |
Possible enzymes for the digestion (adapt NamesOfEnzymes & nextCleavageSite_() if you add more enzymes here) More... | |
enum | Specificity { SPEC_FULL, SPEC_SEMI, SPEC_NONE, SIZE_OF_SPECIFICITY } |
when querying for valid digestion products, this determines if the specificity of the two peptide ends is considered important More... | |
Public Member Functions | |
EnzymaticDigestion () | |
Default constructor. More... | |
EnzymaticDigestion (const EnzymaticDigestion &rhs) | |
Copy constructor. More... | |
EnzymaticDigestion & | operator= (const EnzymaticDigestion &rhs) |
Assignment operator. More... | |
SignedSize | getMissedCleavages () const |
Returns the number of missed cleavages for the digestion. More... | |
void | setMissedCleavages (SignedSize missed_cleavages) |
Sets the number of missed cleavages for the digestion (default is 0). This setting is ignored when log model is used. More... | |
Enzyme | getEnzyme () const |
Returns the enzyme for the digestion. More... | |
void | setEnzyme (Enzyme enzyme) |
Sets the enzyme for the digestion (default is ENZYME_TRYPSIN). More... | |
Specificity | getSpecificity () const |
Returns the specificity for the digestion. More... | |
void | setSpecificity (Specificity spec) |
Sets the specificity for the digestion (default is SPEC_FULL). More... | |
void | digest (const AASequence &protein, std::vector< AASequence > &output) const |
Performs the enzymatic digestion of a protein. More... | |
Size | peptideCount (const AASequence &protein) |
Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings. More... | |
bool | isLogModelEnabled () const |
use trained model when digesting? More... | |
void | setLogModelEnabled (bool enabled) |
enables/disabled the trained model More... | |
double | getLogThreshold () const |
Returns the threshold which needs to be exceeded to call a cleavage (only for the trained cleavage model on real data) More... | |
void | setLogThreshold (double threshold) |
bool | isValidProduct (const AASequence &protein, Size pep_pos, Size pep_length) |
Returns true if peptide at position pep_pos with length pep_length within protein protein was generated by the current model. More... | |
Static Public Member Functions | |
static Enzyme | getEnzymeByName (const String &name) |
static Specificity | getSpecificityByName (const String &name) |
Static Public Attributes | |
static const std::string | NamesOfEnzymes [SIZE_OF_ENZYMES] |
Names of the Enzymes. More... | |
static const std::string | NamesOfSpecificity [SIZE_OF_SPECIFICITY] |
Names of the Specificity. More... | |
Protected Member Functions | |
void | nextCleavageSite_ (const AASequence &sequence, AASequence::ConstIterator &p) const |
moves the iterator p behind (i.e., C-term) the next cleavage site of the sequence More... | |
bool | isCleavageSite_ (const AASequence &sequence, const AASequence::ConstIterator &p) const |
tests if position pointed to by p (N-term side) is a valid cleavage site More... | |
Protected Attributes | |
SignedSize | missed_cleavages_ |
Number of missed cleavages. More... | |
Enzyme | enzyme_ |
Used enzyme. More... | |
Specificity | specificity_ |
specificity of enzyme More... | |
bool | use_log_model_ |
use the log model or naive digestion (with missed cleavages) More... | |
double | log_model_threshold_ |
Threshold to decide if position is cleaved or missed (only for the model) More... | |
Map< BindingSite, CleavageModel > | model_data_ |
Holds the cleavage model. More... | |
Class for the enzymatic digestion of proteins.
Digestion can be performed using simple regular expressions, e.g. [KR] | [^P] for trypsin. Also missed cleavages can be modelled, i.e. adjacent peptides are not cleaved due to enzyme malfunction/access restrictions. If n missed cleavages are given, all possible resulting peptides (cleaved and uncleaved) with up to n missed cleavages are returned. Thus no random selection of just n specific missed cleavage sites is performed.
An alternative model is also available, where the protein is cleaved only at positions where a cleavage model trained on real data, exceeds a certain threshold. The model is published in Siepen et al. (2007), "Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics.", doi: 10.1021/pr060507u The model is only available for trypsin and ignores the missed cleavage setting. You should however use setLogThreshold() to adjust FP vs FN rates. A higher threshold increases the number of cleavages predicted.
Default constructor.
EnzymaticDigestion | ( | const EnzymaticDigestion & | rhs | ) |
Copy constructor.
void digest | ( | const AASequence & | protein, |
std::vector< AASequence > & | output | ||
) | const |
Performs the enzymatic digestion of a protein.
Referenced by SimpleSearchEngine::main_().
Enzyme getEnzyme | ( | ) | const |
Returns the enzyme for the digestion.
convert enzyme string name to enum returns SIZE_OF_ENZYMES if name
is not valid
double getLogThreshold | ( | ) | const |
Returns the threshold which needs to be exceeded to call a cleavage (only for the trained cleavage model on real data)
SignedSize getMissedCleavages | ( | ) | const |
Returns the number of missed cleavages for the digestion.
Specificity getSpecificity | ( | ) | const |
Returns the specificity for the digestion.
|
static |
convert spec string name to enum returns SIZE_OF_SPECIFICITY if name
is not valid
|
protected |
tests if position pointed to by p
(N-term side) is a valid cleavage site
bool isLogModelEnabled | ( | ) | const |
use trained model when digesting?
bool isValidProduct | ( | const AASequence & | protein, |
Size | pep_pos, | ||
Size | pep_length | ||
) |
Returns true if peptide at position pep_pos
with length pep_length
within protein protein
was generated by the current model.
Referenced by FoundProteinFunctor::addHit().
|
protected |
moves the iterator p
behind (i.e., C-term) the next cleavage site of the sequence
EnzymaticDigestion& operator= | ( | const EnzymaticDigestion & | rhs | ) |
Assignment operator.
Size peptideCount | ( | const AASequence & | protein | ) |
Returns the number of peptides a digestion of protein
would yield under the current enzyme and missed cleavage settings.
void setEnzyme | ( | Enzyme | enzyme | ) |
Sets the enzyme for the digestion (default is ENZYME_TRYPSIN).
Referenced by SimpleSearchEngine::main_().
void setLogModelEnabled | ( | bool | enabled | ) |
enables/disabled the trained model
void setLogThreshold | ( | double | threshold | ) |
Sets the threshold which needs to be exceeded to call a cleavage (only for the trained cleavage model on real data) Default is 0.25
void setMissedCleavages | ( | SignedSize | missed_cleavages | ) |
Sets the number of missed cleavages for the digestion (default is 0). This setting is ignored when log model is used.
Referenced by SimpleSearchEngine::main_().
void setSpecificity | ( | Specificity | spec | ) |
Sets the specificity for the digestion (default is SPEC_FULL).
|
protected |
Used enzyme.
|
protected |
Threshold to decide if position is cleaved or missed (only for the model)
|
protected |
Number of missed cleavages.
|
protected |
Holds the cleavage model.
|
static |
Names of the Enzymes.
|
static |
Names of the Specificity.
|
protected |
specificity of enzyme
|
protected |
use the log model or naive digestion (with missed cleavages)
OpenMS / TOPP release 2.0.0 | Documentation generated on Sat May 16 2015 16:13:49 using doxygen 1.8.9.1 |