Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
IDPosteriorErrorProbability

Tool to estimate the probability of peptide hits to be incorrectly assigned.

potential predecessor tools $ \longrightarrow $ IDPosteriorErrorProbability $ \longrightarrow $ potential successor tools
MascotAdapter (or other ID engines) ConsensusID
Experimental classes:
This tool has not been tested thoroughly and might behave not as expected!

By default an estimation is performed using the (inverse) Gumbel distribution for incorrectly assigned sequences and a Gaussian distribution for correctly assigned sequences. The probabilities are calculated by using Bayes' law, similar to PeptideProphet. Alternatively, a second Gaussian distribution can be used for incorrectly assigned sequences. At the moment, IDPosteriorErrorProbability is able to handle X!Tandem, Mascot, MyriMatch and OMSSA scores.

No target/decoy information needs to be provided, since the model fits are done on the mixed distribution.

In order to validate the computed probabilities one can adjust the fit_algorithm subsection.

There are three parameters for the plot: The parameter 'output_plots' is by default false. If set to true the plot will be created. The scores are plotted in form of bins. Each bin represents a set of scores in a range of (highest_score - smallest_score)/number_of_bins (if all scores have positive values). The midpoint of the bin is the mean of the scores it represents. Finally, the parameter output_name should be used to give the plot a unique name. Two files are created. One with the binned scores and one with all steps of the estimation. If top_hits_only is set, only the top hits of each PeptideIndentification are used for the estimation process. Additionally, if 'top_hits_only' is set, target_decoy information are available and a False Discovery Rate run was performed before, an additional plot will be plotted with target and decoy bins(output_plot must be true in fit_algorithm subsection). A peptide hit is assumed to be a target if its q-value is smaller than fdr_for_targets_smaller.

Actually, the plots are saved as a gnuplot file. Therefore, to visualize the plots one has to use gnuplot, e.g. gnuplot file_name. This should output a postscript file which contains all steps of the estimation.

The command line parameters of this tool are:

IDPosteriorErrorProbability -- Estimates probabilities for incorrectly assigned peptide sequences and a set 
of search engine scores using a mixture model.
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976

Usage:
  IDPosteriorErrorProbability <options>

This tool has algoritm parameters that are not shown here! Please check the ini file for a detailed descripti
on or use the --helphelp option.

Options (mandatory options marked with '*'):
  -in <file>*           Input file  (valid formats: 'idXML')
  -out <file>*          Output file  (valid formats: 'idXML')
  -output_name <file>*  Gnuplot file as txt (valid formats: 'txt')
  -split_charge         The search engine scores are split by charge if this flag is set. Thus, for each char
                        ge state a new model will be computed.
  -top_hits_only        If set only the top hits of every PeptideIdentification will be used
  -ignore_bad_data      If set errors will be written but ignored. Useful for pipelines with many datasets 
                        where only a few are bad, but the pipeline should run through.
  -prob_correct         If set scores will be calculated as 1-ErrorProbabilities and can be interpreted as 
                        probabilities for correct identifications.
                        
                        
Common TOPP options:
  -ini <file>           Use the given TOPP INI file
  -threads <n>          Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>     Writes the default configuration file
  --help                Shows options
  --helphelp            Shows all options (including advanced)

The following configuration subsections are valid:
 - fit_algorithm   Algorithm parameter subsection

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+IDPosteriorErrorProbabilityEstimates probabilities for incorrectly assigned peptide sequences and a set of search engine scores using a mixture model.
version1.11.1 Version of the tool that generated this parameters file.
++1Instance '1' section for 'IDPosteriorErrorProbability'
in input file input file*.idXML
out output file output file*.idXML
output_name gnuplot file as txtoutput file*.txt
smallest_e_value1e-19 This value gives a lower bound to E-Values. It should not be 0, as transformation in a real number (log of E-value) is not possible for certain values then.
split_chargefalse The search engine scores are split by charge if this flag is set. Thus, for each charge state a new model will be computed.true,false
top_hits_onlyfalse If set only the top hits of every PeptideIdentification will be usedtrue,false
fdr_for_targets_smaller0.05 Only used, when top_hits_only set. Additionally, target_decoy information should be available. The score_type must be q-value from an previous False Discovery Rate run.
ignore_bad_datafalse If set errors will be written but ignored. Useful for pipelines with many datasets where only a few are bad, but the pipeline should run through.true,false
prob_correctfalse If set scores will be calculated as 1-ErrorProbabilities and can be interpreted as probabilities for correct identifications.true,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
testfalse Enables the test mode (needed for internal use only)true,false
+++fit_algorithmAlgorithm parameter subsection
number_of_bins100 Number of bins used for visualization. Only needed if each iteration step of the EM-Algorithm will be visualized
output_plotsfalse If true every step of the EM-algorithm will be written to a file as a gnuplot formulatrue,false
output_name If output_plots is on, the output files will be saved in the following manner: scores.txt for the scores and which contains each step of the EM-algorithm e.g. output_name = /usr/home/OMSSA123 then /usr/home/OMSSA123_scores.txt, /usr/home/OMSSA123 will be written. If no directory is specified, e.g. instead of '/usr/home/OMSSA123' just OMSSA123, the files will be written into the working directory.output file
incorrectly_assignedGumbel for 'Gumbel', the Gumbel distribution is used to plot incorrectly assigned sequences. For 'Gauss', the Gauss distribution is used.Gumbel,Gauss

For the parameters of the algorithm section see the algorithms documentation:
fit_algorithm


OpenMS / TOPP release 1.11.1 Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5