Converts identification engine file formats.
potential predecessor tools | ![]() ![]() | potential successor tools |
TPP tools: PeptideProphet, ProteinProphet | TPP tools: ProteinProphet (for conversion from idXML to pepXML) | |
Sequest protein identification engine |
Conversion from the TPP file formats pepXML and protXML to OpenMS' idXML is quite comprehensive, to the extent that the original data can be represented in the simpler idXML format.
In contrast, support for converting from idXML to pepXML is limited. The purpose here is simply to create pepXML files containing the relevant information for the use of ProteinProphet.
Details on additional parameters:
mz_file:
Some search engine output files (like pepXML, mascotXML, Sequest .out files) may not contain retention times, only scan numbers. To be able to look up the actual RT values, the raw file has to be provided using the parameter mz_file
. (If the identification results should be used later to annotate feature maps or consensus maps, it is critical that they contain RT values. See also IDMapper.)
mz_name:
PepXML files can contain results from multiple experiments. However, the idXML format does not support this. The mz_name
parameter (or mz_file
, if given) thus serves to define what parts to extract from the pepXML.
scan_regex:
For Mascot results exported to XML, the scan numbers (used to look up retention times using mz_file
) should be given in the "pep_scan_title" XML elements, but the format can vary. If the defaults fail to extract the scan numbers, a Perl-style regular expression can be given through the advanced parameter scan_regex
, and will be used instead. The regular expression should contain a named group "SCAN" matching the scan number or "RT" matching the actual retention time. For example, if the format of the "pep_scan_title" elements is "scan=123", where 123 is the scan number, the expression "scan=(?<SCAN>\\d+)" can be used to extract the number. (However, the format in this example is actually covered by the defaults.)
Some information about the supported input types: mzIdentML pepXML protXML idXML mascotXML omssaXML Sequest .out directory
The command line parameters of this tool are:
IDFileConverter -- Converts identification engine file formats. Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976 Usage: IDFileConverter <options> Options (mandatory options marked with '*'): -in <path/file>* Input file or directory containing the output of the search engine. Sequest: Directory containing the .out files pepXML: Single pepXML file. protXML: Single protXML file. mascotXML: Single Mascot XML file. omssaXML: Single OMSSA XML file. idXML: Single idXML file. (valid formats: 'pepXML', 'protXML', 'mascotXML', 'omssaXML', 'idXML') -out <file>* Output file (valid formats: 'idXML', 'mzid', 'pepXML', 'FASTA') -out_type <type> Output file type -- default: determined from file extension or content (valid: 'idXML', 'mzid', 'pepXML', 'FASTA') -mz_file <file> [Sequest, pepXML, mascotXML only] Retention times will be looked up in this file (val id formats: 'mzML', 'mzXML', 'mzData') -mz_name <file> [pepXML only] Experiment filename/path (extension will be removed) to match in the pepXML file ('base_name' attribute). Only necessary if different from 'mz_file'. -use_precursor_data [pepXML only] Use precursor RTs (and m/z values) from 'mz_file' for the generated peptide identifications, instead of the RTs of MS2 spectra. Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced)
INI file documentation of this tool:
OpenMS / TOPP release 1.11.1 | Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5 |