Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
SuffixArrayTrypticCompressed Class Reference

Class that implements a suffix array for a String. It can be used to find peptide Candidates for a MS spectrum. More...

#include <OpenMS/DATASTRUCTURES/SuffixArrayTrypticCompressed.h>

Inheritance diagram for SuffixArrayTrypticCompressed:
SuffixArray WeightWrapper

Public Member Functions

 SuffixArrayTrypticCompressed (const String &st, const String &filename, const WeightWrapper::WEIGHTMODE weight_mode=WeightWrapper::MONO)
 constructor taking the string and the filename for writing or reading More...
 
 SuffixArrayTrypticCompressed (const SuffixArrayTrypticCompressed &sa)
 copy constructor More...
 
virtual ~SuffixArrayTrypticCompressed ()
 destructor More...
 
String toString ()
 transforms suffix array to a printable String More...
 
void findSpec (std::vector< std::vector< std::pair< std::pair< SignedSize, SignedSize >, DoubleReal > > > &candidates, const std::vector< DoubleReal > &spec)
 the function that will find all peptide candidates for a given spectrum More...
 
bool save (const String &file_name)
 saves the suffix array to disc More...
 
bool open (const String &file_name)
 opens the suffix array More...
 
void setTolerance (DoubleReal t)
 setter for tolerance More...
 
DoubleReal getTolerance () const
 getter for tolerance More...
 
bool isDigestingEnd (const char aa1, const char aa2) const
 returns if an enzyme will cut after first character More...
 
void setTags (const std::vector< String > &tags)
 setter for tags More...
 
const std::vector< String > & getTags ()
 getter for tags More...
 
void setUseTags (bool use_tags)
 setter for use_tags More...
 
bool getUseTags ()
 getter for use_tags More...
 
void setNumberOfModifications (Size number_of_mods)
 setter for number of modifications More...
 
Size getNumberOfModifications ()
 getter for number of modifications More...
 
void printStatistic ()
 output for statistic More...
 
- Public Member Functions inherited from SuffixArray
 SuffixArray (const String &st, const String &filename)
 constructor taking the string and the filename for writing or reading More...
 
 SuffixArray (const SuffixArray &sa)
 copy constructor More...
 
virtual ~SuffixArray ()=0
 destructor More...
 
 SuffixArray ()
 constructor More...
 
- Public Member Functions inherited from WeightWrapper
 WeightWrapper ()
 constructor More...
 
 WeightWrapper (const WEIGHTMODE weight_mode)
 constructor More...
 
virtual ~WeightWrapper ()
 destructor More...
 
 WeightWrapper (const WeightWrapper &source)
 copy constructor More...
 
void setWeightMode (const WEIGHTMODE mode)
 Sets the weight mode (MONO or AVERAGE) More...
 
WEIGHTMODE getWeightMode () const
 Gets the weight mode (MONO or AVERAGE) More...
 
DoubleReal getWeight (const AASequence &aa) const
 returns the weight of either mono or average value More...
 
DoubleReal getWeight (const EmpiricalFormula &ef) const
 returns the weight of either mono or average value More...
 
DoubleReal getWeight (const Residue &r, Residue::ResidueType res_type=Residue::Full) const
 returns the weight of either mono or average value More...
 

Protected Member Functions

 SuffixArrayTrypticCompressed ()
 constructor More...
 
SignedSize getNextSep_ (const SignedSize p) const
 gets the index of the next sperator for a given index More...
 
SignedSize getLCP_ (const std::pair< SignedSize, SignedSize > &last_point, const std::pair< SignedSize, SignedSize > &current_point)
 gets the lcp for two strings described as pairs of ints More...
 
SignedSize findFirst_ (const std::vector< DoubleReal > &spec, DoubleReal &m)
 binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. More...
 
SignedSize findFirst_ (const std::vector< DoubleReal > &spec, DoubleReal &m, SignedSize start, SignedSize end)
 binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. it searches recursivly. More...
 
void parseTree_ (SignedSize start_index, SignedSize stop_index, SignedSize depth, SignedSize walked_in, SignedSize edge_len, std::vector< std::pair< SignedSize, SignedSize > > &out_number, std::vector< std::pair< SignedSize, SignedSize > > &edge_length, std::vector< SignedSize > &leafe_depth)
 treats the suffix array as a tree and parses the tree using postorder traversion. This is realised by a recursive algorithm. More...
 
bool hasMoreOutgoings_ (SignedSize start_index, SignedSize stop_index, SignedSize walked_in)
 indicates if a node during traversal has more outgoings More...
 

Protected Attributes

const Strings_
 the string with which the suffix array is build More...
 
DoubleReal tol_
 mass tolerance for finding candidates More...
 
std::vector< std::pair
< SignedSize, SignedSize > > 
indices_
 vector of pairs of ints describing all relevant sufices More...
 
std::vector< SignedSizelcp_
 vector of ints with lcp values More...
 
std::vector< SignedSizeskip_
 vector of ints with skip values More...
 
DoubleReal masse_ [256]
 mass table More...
 
Size number_of_modifications_
 number of allowed modifications More...
 
std::vector< Stringtags_
 all given tags More...
 
bool use_tags_
 indicates whether tags are used or not More...
 
SignedSize progress_
 

Additional Inherited Members

- Public Types inherited from WeightWrapper
enum  WEIGHTMODE { AVERAGE = 0, MONO, SIZE_OF_WEIGHTMODE }
 

Detailed Description

Class that implements a suffix array for a String. It can be used to find peptide Candidates for a MS spectrum.

This class implements a suffix array. It can just be used for finding peptide Candidates for a given MS Spectrum within a certain mass tolerance. The suffix array can be saved to disc for reused so it has to be build just once. The suffix array consits of a vector of pair of ints for every suffix, a vector of LCP values and a so called skip vector. Only the sufices that are matching the function isDigestingEnd are created. Besides a suffix will not reach till the end of the string but till the next occurence of the separator ($). So only the interessting sufices will be saved. This will reduce the used space.

Constructor & Destructor Documentation

SuffixArrayTrypticCompressed ( const String st,
const String filename,
const WeightWrapper::WEIGHTMODE  weight_mode = WeightWrapper::MONO 
)

constructor taking the string and the filename for writing or reading

Parameters
stthe string as const reference with which the suffix array will be build
filenamethe filename for writing or reading the suffix array
weight_modeif not monoistopic weight should be used, this parameters can be set to AVERAGE
Exceptions
Exception::InvalidValueif string does not start with empty string ($)
FileNotFoundis thrown if the given file was not found

The constructor checks if a suffix array with given filename (without file extension) exists or not. In the first case it will simple be loaded and otherwise it will be build. Bulding the suffix array consists of several steps. At first all indices for a digesting enzyme (defined by using function isDigestingEnd) are created as an vector of SignedSize pairs. After creating all relevant indices they are sorted and the lcp and skip vectors are created.

copy constructor

virtual ~SuffixArrayTrypticCompressed ( )
virtual

destructor

constructor

Member Function Documentation

SignedSize findFirst_ ( const std::vector< DoubleReal > &  spec,
DoubleReal m 
)
protected

binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance.

Parameters
specconst reference to spectrum
mmass
Returns
SignedSize with the index of the first occurence
Note
requires that there is at least one occurence
SignedSize findFirst_ ( const std::vector< DoubleReal > &  spec,
DoubleReal m,
SignedSize  start,
SignedSize  end 
)
protected

binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. it searches recursivly.

Parameters
specconst reference to spectrum
mmass
startstart index
endend index
Returns
SignedSize with the index of the first occurence
Note
requires that there is at least one occurence
void findSpec ( std::vector< std::vector< std::pair< std::pair< SignedSize, SignedSize >, DoubleReal > > > &  candidates,
const std::vector< DoubleReal > &  spec 
)
virtual

the function that will find all peptide candidates for a given spectrum

Parameters
specconst reference of DoubleReal vector describing the spectrum
candidatesoutput parameter which contains the candidates of the masses given in spec
Returns
a vector of SignedSize pairs.
Exceptions
InvalidValueif the spectrum is not sorted ascendingly

for every mass within the spectrum all candidates described by as pairs of ints are returned. All masses are searched for the same time in just one suffix array traversal. In order to accelerate the traversal the skip and lcp table are used. The mass wont be calculated for each entry but it will be updated during traversal using a stack datastructure

Implements SuffixArray.

SignedSize getLCP_ ( const std::pair< SignedSize, SignedSize > &  last_point,
const std::pair< SignedSize, SignedSize > &  current_point 
)
protected

gets the lcp for two strings described as pairs of ints

Parameters
last_pointconst pair of ints describing a substring
current_pointconst pair of ints describing a substring
Returns
SignedSize with the length of the lowest common prefix
SignedSize getNextSep_ ( const SignedSize  p) const
protected

gets the index of the next sperator for a given index

Parameters
pconst SignedSize describing a position in the string
Returns
SignedSize with the index of the next occurence of the sperator or -1 if there is no more separator
Size getNumberOfModifications ( )
virtual

getter for number of modifications

Returns
unsigned SignedSize describing number of modifications

Implements SuffixArray.

const std::vector<String>& getTags ( )
virtual

getter for tags

Returns
const vector of string with tags

Implements SuffixArray.

DoubleReal getTolerance ( ) const
virtual

getter for tolerance

Returns
DoubleReal with tolerance

Implements SuffixArray.

bool getUseTags ( )
virtual

getter for use_tags

Returns
bool indicating whether tags are used or not

Implements SuffixArray.

bool hasMoreOutgoings_ ( SignedSize  start_index,
SignedSize  stop_index,
SignedSize  walked_in 
)
protected

indicates if a node during traversal has more outgoings

Parameters
start_indexSignedSize describing the start index in indices_ vector
stop_indexSignedSize describing the end index in indices_ vector
walked_inhow many characters we have seen from root to actual position
bool isDigestingEnd ( const char  aa1,
const char  aa2 
) const
virtual

returns if an enzyme will cut after first character

Parameters
aa1const char as first aminoacid
aa2const char as second aminoacid
Returns
bool descibing if it is a digesting site

Implements SuffixArray.

bool open ( const String file_name)
virtual

opens the suffix array

Parameters
file_nameconst reference string describing the filename
Returns
bool if operation was succesful
Exceptions
FileNotFound

Implements SuffixArray.

void parseTree_ ( SignedSize  start_index,
SignedSize  stop_index,
SignedSize  depth,
SignedSize  walked_in,
SignedSize  edge_len,
std::vector< std::pair< SignedSize, SignedSize > > &  out_number,
std::vector< std::pair< SignedSize, SignedSize > > &  edge_length,
std::vector< SignedSize > &  leafe_depth 
)
protected

treats the suffix array as a tree and parses the tree using postorder traversion. This is realised by a recursive algorithm.

Parameters
start_indexSignedSize describing the start index in indices_ vector
stop_indexSignedSize describing the end index in indices_ vector
depthat with depth the traversion is at the actual position
walked_inhow many characters we have seen from root to actual position
edge_lenhow many characters we have seen from last node to actual position
out_numberreference to vector of pairs of ints. For every node it will be filled with how many outgoing edge a node has in dependece of its depth
edge_lengthwill be filled with the edge_length in dependence of its depth
leafe_depthwill be filled with the depth of every leafe
Note
intialize: walked_in=0, depth=1, edge_len=1
void printStatistic ( )
virtual

output for statistic

Implements SuffixArray.

bool save ( const String file_name)
virtual

saves the suffix array to disc

Parameters
file_nameconst reference string describing the filename
Returns
bool if operation was succesful
Exceptions
Exception::UnableToCreateFileif file could not be created (e.x. if you have no rigths)

Implements SuffixArray.

void setNumberOfModifications ( Size  number_of_mods)
virtual

setter for number of modifications

Parameters
number_of_mods

Implements SuffixArray.

void setTags ( const std::vector< String > &  tags)
virtual

setter for tags

Parameters
tagsconst vector of strings with tags with length 3 each
Exceptions
InvalidValueif at least one tag does not have size of 3

Implements SuffixArray.

void setTolerance ( DoubleReal  t)
virtual

setter for tolerance

Parameters
tDoubleReal with tolerance
Exceptions
Exception::InvalidValueif tolerance is negative

Implements SuffixArray.

void setUseTags ( bool  use_tags)
virtual

setter for use_tags

Parameters
use_tagsindicating whether tags should be used or not

Implements SuffixArray.

String toString ( )
virtual

transforms suffix array to a printable String

Implements SuffixArray.

Member Data Documentation

std::vector<std::pair<SignedSize, SignedSize> > indices_
protected

vector of pairs of ints describing all relevant sufices

std::vector<SignedSize> lcp_
protected

vector of ints with lcp values

DoubleReal masse_[256]
protected

mass table

Size number_of_modifications_
protected

number of allowed modifications

SignedSize progress_
protected
const String& s_
protected

the string with which the suffix array is build

std::vector<SignedSize> skip_
protected

vector of ints with skip values

std::vector<String> tags_
protected

all given tags

DoubleReal tol_
protected

mass tolerance for finding candidates

bool use_tags_
protected

indicates whether tags are used or not


OpenMS / TOPP release 1.11.1 Documentation generated on Thu Nov 14 2013 11:19:29 using doxygen 1.8.5