Runs the protein inference engine Fido.
pot. predecessor tools | ![]() ![]() | pot. successor tools |
PeptideIndexer (with annotate_proteins option) | ProteinQuantifier (via protein_groups parameter) | |
IDPosteriorErrorProbability (with prob_correct option) |
This tool wraps the protein inference algorithm Fido (http://noble.gs.washington.edu/proj/fido/). Fido uses a Bayesian probabilistic model to group and score proteins based on peptide-spectrum matches. It was published in:
Serang et al.: Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data (J. Proteome Res., 2010).
By default, this adapter runs the Fido variant with parameter estimation (FidoChooseParameters
), as recommended by the authors of Fido. However, it is also possible to run "pure" Fido by setting the prob:protein
, prob:peptide
and prob:spurious
parameters, if appropriate values are known (e.g. from a previous Fido run). Other parameters, except for log2_states
, are not applicable in this case.
Depending on the separate_runs
setting, data from input files containing multiple protein identification runs (e.g. several replicates or different search engines) will be merged (default) or annotated separately.
Input format:
Care has to be taken to provide suitable input data for this adapter. In the peptide/protein identification results (e.g. coming from a database search engine), the proteins have to be annotated with target/decoy meta data. To achieve this, run PeptideIndexer with the annotate_proteins
option switched on.
In addition, the scores for peptide hits in the input data have to be posterior probabilities - as produced e.g. by PeptideProphet in the TPP or by IDPosteriorErrorProbability (with the prob_correct
option switched on) in OpenMS. Inputs from IDPosteriorErrorProbability (without prob_correct
) or from ConsensusID are treated as special cases: Their posterior error probabilities (lower is better) are converted to posterior probabilities (higher is better) for processing.
Output format:
The output of this tool is an augmented version of the input: The protein groups and accompanying posterior probabilities inferred by Fido are stored as "indistinguishable protein groups", attached to the protein identification run(s) of the input data. Also attached are meta values recording the Fido parameters (Fido_prob_protein
, Fido_prob_peptide
, Fido_prob_spurious
).
The result can be passed to ProteinQuantifier via its protein_groups
parameter, to have the protein grouping taken into account during quantification.
Note that if the input contains multiple identification runs and separate_runs
is not set (the default), the identification data from all runs will be pooled for the Fido analysis and the result will only contain one (merged) identification run. This is the desired behavior if the protein grouping should be used by ProteinQuantifier.
The command line parameters of this tool are:
FidoAdapter -- Runs the protein inference engine Fido. Version: 2.0.0 May 16 2015, 09:22:21, Revision: GIT-NOTFOUND Usage: FidoAdapter <options> Options (mandatory options marked with '*'): -in <file>* Input: identification results (valid formats: 'idXML') -out <file>* Output: identification results with scored/grouped proteins (valid formats: 'idXML') -fido_executable <path>* Path to the Fido executable to use; may be empty if the executable is globally available. -fidocp_executable <path>* Path to the FidoChooseParameters executable to use; may be empty if the executa ble is globally available. -prob_param <string> Read the peptide probability from this user parameter ('UserParam') in the inpu t file, instead of from the 'score' field, if available. (Use e.g. for search results that were processed with the TOPP tools IDPosteriorErrorProbability followed by FalseDiscoveryRate.) (default: 'Posterior Probability_score') -separate_runs Process multiple protein identification runs in the input separately, don't merge them. Merging results in loss of descriptive information of the single protein identification runs. -no_cleanup Omit clean-up of peptide sequences (removal of non-letter characters, replaceme nt of I with L) -all_PSMs Consider all PSMs of each peptide, instead of only the best one -group_level Perform inference on protein group level (instead of individual protein level). This will lead to higher probabilities for (bigger) protein groups. -log2_states <number> Binary logarithm of the max. number of connected states in a subgraph. For a value N, subgraphs that are bigger than 2^N will be split up, sacrificing accur acy for runtime. '0' uses the default (18). (default: '0' min: '0') Probability values for running Fido directly, i.e. without parameter estimation (in which case other settings , except 'log2_states', are ignored): -prob:protein <value> Protein prior probability ('gamma' parameter) (default: '0' min: '0') -prob:peptide <value> Peptide emission probability ('alpha' parameter) (default: '0' min: '0') -prob:spurious <value> Spurious peptide identification probability ('beta' parameter) (default: '0' min: '0') Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced)
INI file documentation of this tool:
OpenMS / TOPP release 2.0.0 | Documentation generated on Sat May 16 2015 16:13:42 using doxygen 1.8.9.1 |