readFASTA {Biostrings} | R Documentation |
readFASTA
and writeFASTA
read from and write to a FASTA file.
Note that the object returned by readFASTA
or passed to
writeFASTA
is a standard list. For faster and more memory efficient
alternatives that return/accept an XStringSet object, see the
read.DNAStringSet
function and family.
readFASTA(file, checkComments=TRUE, strip.descs=TRUE) writeFASTA(x, file="", append=FALSE, width=80)
file |
Either a character string naming a file or a connection.
If "" (the default for writeFASTA ),
then the function writes to the standard output connection (the console)
unless redirected by sink .
|
checkComments |
Whether or not comments, lines beginning with a semi-colon should be found and removed. |
strip.descs |
Whether or not the ">" marking the beginning of the description
lines should be removed. Note that this argument is new
in Biostrings >= 2.8. In previous versions readFASTA
was keeping the ">".
|
x |
A list as one returned by readFASTA .
|
append |
TRUE or FALSE . If TRUE output will be
appended to file ; otherwise, it will overwrite the contents
of file . See ?cat for the details.
|
width |
The maximum number of letters per line of sequence. |
FASTA is a simple file format for biological sequence data. A file may
contain one or more sequences, for each sequence there is a description
line which begins with a >
.
FASTA is a widely used format in biology. It is a relatively simple markup. I am not aware of a standard. It might be nice to check to see if the data that were parsed are sequences of some appropriate type, but without a standard that does not seem possible.
There are many other packages that provide similar, but different capabilities. The one in the package seqinr seems most similar but they separate the biological sequence into single character strings, which is too inefficient for large problems.
For readFASTA
: A list with one element per FASTA record in the file.
Each element is in two parts, one is the description of the record
and the second a character string of the biological sequence.
R. Gentleman, H. Pages
read.DNAStringSet
,
fasta.info
,
write.XStringSet
,
read.table
,
scan
,
write.table
f1 <- system.file("extdata", "someORF.fa", package="Biostrings") ff <- readFASTA(f1, strip.descs=TRUE) desc <- sapply(ff, function(x) x$desc) ## Keep the "reverse complement" sequences only ff2 <- ff[grep("reverse complement", desc, fixed=TRUE)] writeFASTA(ff2, file.path(tempdir(), "someORF2.fa"))