XStringSet-io {Biostrings}R Documentation

Read/write an XStringSet or XStringViews object from/to a file

Description

Functions to read/write an XStringSet or XStringViews object from/to a file.

Usage

  ## XStringSet object:
  read.BStringSet(file, format)
  read.DNAStringSet(file, format)
  read.RNAStringSet(file, format)
  read.AAStringSet(file, format)
  write.XStringSet(x, file="", append=FALSE, format, width=80)

  ## XStringViews object:
  read.XStringViews(file, format, subjectClass, collapse="")
  write.XStringViews(x, file="", append=FALSE, format, width=80)

  ## FASTQ utilities:
  fastq.geometry(file)

  ## Some related helper functions:
  FASTArecordsToCharacter(FASTArecs, use.names=TRUE)
  CharacterToFASTArecords(x)
  FASTArecordsToXStringViews(FASTArecs, subjectClass, collapse="")
  XStringSetToFASTArecords(x)

Arguments

file A character vector with no NAs. If "" (the default for write.XStringSet and write.XStringViews), then the functions write to the standard output connection (the console) unless redirected by sink.
format Either "fasta" or "fastq". Note that write.XStringSet and write.XStringViews only support "fasta" for now.
x For write.XStringSet and write.XStringViews, the object to write to file. For CharacterToFASTArecords, the (possibly named) character vector to be converted to a list of FASTA records as one returned by readFASTA. For XStringSetToFASTArecords, the XStringSet object to be converted to a list of FASTA records as one returned by readFASTA.
append TRUE or FALSE. If TRUE output will be appended to file; otherwise, it will overwrite the contents of file. See ?cat for the details.
width Only relevant if format is "fasta". The maximum number of letters per line of sequence.
subjectClass The class to be given to the subject of the XStringViews object created and returned by the function. Must be the name of one of the direct XString subclasses i.e. "BString", "DNAString", "RNAString" or "AAString".
collapse An optional character string to be inserted between the views of the XStringViews object created and returned by the function.
FASTArecs A list of FASTA records as one returned by readFASTA.
use.names Whether or not the description line preceding each FASTA records should be used to set the names of the returned object.

Details

Only FASTA and FASTQ files are supported for now. The identifiers and qualities stored in the FASTQ records are ignored (only the sequences are returned).

Reading functions read.BStringSet, read.DNAStringSet, read.RNAStringSet, read.AAStringSet and read.XStringViews load sequences from a file into an XStringSet or XStringViews object. Only one FASTA file, but more than one FASTQ file, can be read at a time (by passing a character vector of length > 1). In that case, all the FASTQ files must have the same "width" (i.e. all their sequences must have the same length) and the sequences from all the files are stored in the returned object in the order they were read.

The fastq.geometry utility returns an integer vector describing the "geometry" of the FASTQ files i.e. a vector of length 2 where the first element is the total number of sequences contained in the FASTQ files and the second element the "width" of these files (this width is NA if the files have different "widths").

Writing functions write.XStringSet and write.XStringViews write an XStringSet or XStringViews object to a file or connection. They only support the FASTA format for now.

FASTArecordsToCharacter, CharacterToFASTArecords, FASTArecordsToXStringViews and XStringSetToFASTArecords are helper functions used internally by write.XStringSet and read.XStringViews for switching between different representations of the same object.

See Also

fasta.info, readFASTA, writeFASTA, XStringSet-class, XStringViews-class, BString-class, DNAString-class, RNAString-class, AAString-class

Examples

  ## ---------------------------------------------------------------------
  ## A. READ/WRITE FASTA FILES
  ## ---------------------------------------------------------------------
  file <- system.file("extdata", "someORF.fa", package="Biostrings")
  x <- read.DNAStringSet(file, "fasta")
  x
  write.XStringSet(x, format="fasta")  # writes to the console

  ## ---------------------------------------------------------------------
  ## B. READ FASTQ FILES
  ## ---------------------------------------------------------------------
  file <- system.file("extdata", "s_1_sequence.txt", package="Biostrings")
  fastq.geometry(file)
  read.DNAStringSet(file, "fastq")  # only the FASTQ sequences are returned
                                    # (identifiers and qualities are dropped)

  ## ---------------------------------------------------------------------
  ## C. SOME RELATED HELPER FUNCTIONS
  ## ---------------------------------------------------------------------
  ## Converting 'x'...
  ## ... to a list of FASTA records (as one returned by the "readFASTA" function)
  x1 <- XStringSetToFASTArecords(x)
  ## ... to a named character vector
  x2 <- FASTArecordsToCharacter(x1) # same as 'as.character(x)'

[Package Biostrings version 2.12.9 Index]