Class that implements a suffix array for a String. It can be used to find peptide Candidates for a MS spectrum. More...
#include <OpenMS/DATASTRUCTURES/SuffixArrayTrypticCompressed.h>
Public Member Functions | |
SuffixArrayTrypticCompressed (const String &st, const String &filename, const WeightWrapper::WEIGHTMODE weight_mode=WeightWrapper::MONO) | |
constructor taking the string and the filename for writing or reading More... | |
SuffixArrayTrypticCompressed (const SuffixArrayTrypticCompressed &sa) | |
copy constructor More... | |
virtual | ~SuffixArrayTrypticCompressed () |
destructor More... | |
String | toString () |
transforms suffix array to a printable String More... | |
void | findSpec (std::vector< std::vector< std::pair< std::pair< SignedSize, SignedSize >, double > > > &candidates, const std::vector< double > &spec) |
the function that will find all peptide candidates for a given spectrum More... | |
bool | save (const String &file_name) |
saves the suffix array to disc More... | |
bool | open (const String &file_name) |
opens the suffix array More... | |
void | setTolerance (double t) |
setter for tolerance More... | |
double | getTolerance () const |
getter for tolerance More... | |
bool | isDigestingEnd (const char aa1, const char aa2) const |
returns if an enzyme will cut after first character More... | |
void | setTags (const std::vector< String > &tags) |
setter for tags More... | |
const std::vector< String > & | getTags () |
getter for tags More... | |
void | setUseTags (bool use_tags) |
setter for use_tags More... | |
bool | getUseTags () |
getter for use_tags More... | |
void | setNumberOfModifications (Size number_of_mods) |
setter for number of modifications More... | |
Size | getNumberOfModifications () |
getter for number of modifications More... | |
void | printStatistic () |
output for statistic More... | |
![]() | |
SuffixArray (const String &st, const String &filename) | |
constructor taking the string and the filename for writing or reading More... | |
SuffixArray (const SuffixArray &sa) | |
copy constructor More... | |
virtual | ~SuffixArray ()=0 |
destructor More... | |
SuffixArray () | |
constructor More... | |
![]() | |
WeightWrapper () | |
constructor More... | |
WeightWrapper (const WEIGHTMODE weight_mode) | |
constructor More... | |
virtual | ~WeightWrapper () |
destructor More... | |
WeightWrapper (const WeightWrapper &source) | |
copy constructor More... | |
void | setWeightMode (const WEIGHTMODE mode) |
Sets the weight mode (MONO or AVERAGE) More... | |
WEIGHTMODE | getWeightMode () const |
Gets the weight mode (MONO or AVERAGE) More... | |
double | getWeight (const AASequence &aa) const |
returns the weight of either mono or average value More... | |
double | getWeight (const EmpiricalFormula &ef) const |
returns the weight of either mono or average value More... | |
double | getWeight (const Residue &r, Residue::ResidueType res_type=Residue::Full) const |
returns the weight of either mono or average value More... | |
Protected Member Functions | |
SuffixArrayTrypticCompressed () | |
constructor More... | |
SignedSize | getNextSep_ (const SignedSize p) const |
gets the index of the next separator for a given index More... | |
SignedSize | getLCP_ (const std::pair< SignedSize, SignedSize > &last_point, const std::pair< SignedSize, SignedSize > ¤t_point) |
gets the lcp for two strings described as pairs of ints More... | |
SignedSize | findFirst_ (const std::vector< double > &spec, double &m) |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. More... | |
SignedSize | findFirst_ (const std::vector< double > &spec, double &m, SignedSize start, SignedSize end) |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. It searches recursively. More... | |
void | parseTree_ (SignedSize start_index, SignedSize stop_index, SignedSize depth, SignedSize walked_in, SignedSize edge_len, std::vector< std::pair< SignedSize, SignedSize > > &out_number, std::vector< std::pair< SignedSize, SignedSize > > &edge_length, std::vector< SignedSize > &leafe_depth) |
treats the suffix array as a tree and parses the tree using postorder traversal. This is realised by a recursive algorithm. More... | |
bool | hasMoreOutgoings_ (SignedSize start_index, SignedSize stop_index, SignedSize walked_in) |
indicates if a node during traversal has more outgoings More... | |
Protected Attributes | |
const String & | s_ |
the string with which the suffix array is build More... | |
double | tol_ |
mass tolerance for finding candidates More... | |
std::vector< std::pair< SignedSize, SignedSize > > | indices_ |
vector of pairs of ints describing all relevant suffixes More... | |
std::vector< SignedSize > | lcp_ |
vector of ints with lcp values More... | |
std::vector< SignedSize > | skip_ |
vector of ints with skip values More... | |
double | masse_ [256] |
mass table More... | |
Size | number_of_modifications_ |
number of allowed modifications More... | |
std::vector< String > | tags_ |
all given tags More... | |
bool | use_tags_ |
indicates whether tags are used or not More... | |
SignedSize | progress_ |
Additional Inherited Members | |
![]() | |
enum | WEIGHTMODE { AVERAGE = 0, MONO, SIZE_OF_WEIGHTMODE } |
Class that implements a suffix array for a String. It can be used to find peptide Candidates for a MS spectrum.
This class implements a suffix array. It can just be used for finding peptide Candidates for a given MS Spectrum within a certain mass tolerance. The suffix array can be saved to disc for reuse so it has to be built just once. The suffix array consists of a vector of pair of ints for every suffix, a vector of LCP values and a so called skip vector.
Only the suffixes that are matching the function isDigestingEnd are created. Besides a suffix will not reach till the end of the string but till the next occurrence of the separator ($). So only the interesting suffixes will be saved. This will reduce the used space.
SuffixArrayTrypticCompressed | ( | const String & | st, |
const String & | filename, | ||
const WeightWrapper::WEIGHTMODE | weight_mode = WeightWrapper::MONO |
||
) |
constructor taking the string and the filename for writing or reading
st | the string as const reference with which the suffix array will be build |
filename | the filename for writing or reading the suffix array |
weight_mode | if not monoisotopic weight should be used, this parameters can be set to AVERAGE |
Exception::InvalidValue | if string does not start with empty string ($) |
FileNotFound | is thrown if the given file was not found |
The constructor checks if a suffix array with given filename (without file extension) exists or not. In the first case it will simple be loaded and otherwise it will be build. Building the suffix array consists of several steps. At first all indices for a digesting enzyme (defined by using function isDigestingEnd) are created as an vector of SignedSize pairs. After creating all relevant indices they are sorted and the lcp and skip vectors are created.
SuffixArrayTrypticCompressed | ( | const SuffixArrayTrypticCompressed & | sa | ) |
copy constructor
|
virtual |
destructor
|
protected |
constructor
|
protected |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance.
spec | const reference to spectrum |
m | mass |
|
protected |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. It searches recursively.
spec | const reference to spectrum |
m | mass |
start | start index |
end | end index |
|
virtual |
the function that will find all peptide candidates for a given spectrum
spec | const reference of double vector describing the spectrum |
candidates | output parameter which contains the candidates of the masses given in spec |
InvalidValue | if the spectrum is not sorted ascendingly |
For every mass within the spectrum all candidates described by as pairs of ints are returned. All masses are searched for the same time in just one suffix array traversal. In order to accelerate the traversal the skip and lcp table are used. The mass wont be calculated for each entry but it will be updated during traversal using a stack data structure
Implements SuffixArray.
|
protected |
gets the lcp for two strings described as pairs of ints
last_point | const pair of ints describing a substring |
current_point | const pair of ints describing a substring |
|
protected |
gets the index of the next separator for a given index
p | const SignedSize describing a position in the string |
|
virtual |
getter for number of modifications
Implements SuffixArray.
|
virtual |
|
virtual |
|
virtual |
|
protected |
indicates if a node during traversal has more outgoings
start_index | SignedSize describing the start index in indices_ vector |
stop_index | SignedSize describing the end index in indices_ vector |
walked_in | how many characters we have seen from root to actual position |
|
virtual |
returns if an enzyme will cut after first character
aa1 | const char as first amino acid |
aa2 | const char as second amino acid |
Implements SuffixArray.
opens the suffix array
file_name | const reference string describing the filename |
FileNotFound |
Implements SuffixArray.
|
protected |
treats the suffix array as a tree and parses the tree using postorder traversal. This is realised by a recursive algorithm.
start_index | SignedSize describing the start index in indices_ vector |
stop_index | SignedSize describing the end index in indices_ vector |
depth | at with depth the traversal is at the actual position |
walked_in | how many characters we have seen from root to actual position |
edge_len | how many characters we have seen from last node to actual position |
out_number | reference to vector of pairs of ints. For every node it will be filled with how many outgoing edge a node has in dependence of its depth |
edge_length | will be filled with the edge_length in dependence of its depth |
leafe_depth | will be filled with the depth of every leaf |
|
virtual |
output for statistic
Implements SuffixArray.
saves the suffix array to disc
file_name | const reference string describing the filename |
Exception::UnableToCreateFile | if file could not be created (e.g. if you have no rights) |
Implements SuffixArray.
|
virtual |
|
virtual |
setter for tags
tags | const vector of strings with tags with length 3 each |
InvalidValue | if at least one tag does not have size of 3 |
Implements SuffixArray.
|
virtual |
setter for tolerance
t | double with tolerance |
Exception::InvalidValue | if tolerance is negative |
Implements SuffixArray.
|
virtual |
setter for use_tags
use_tags | indicating whether tags should be used or not |
Implements SuffixArray.
|
virtual |
transforms suffix array to a printable String
Implements SuffixArray.
|
protected |
vector of pairs of ints describing all relevant suffixes
|
protected |
vector of ints with lcp values
|
protected |
mass table
|
protected |
number of allowed modifications
|
protected |
|
protected |
the string with which the suffix array is build
|
protected |
vector of ints with skip values
|
protected |
all given tags
|
protected |
mass tolerance for finding candidates
|
protected |
indicates whether tags are used or not
OpenMS / TOPP release 2.0.0 | Documentation generated on Sat May 16 2015 16:14:07 using doxygen 1.8.9.1 |