XStringSet-class {Biostrings} | R Documentation |
The BStringSet class is a container for storing a set of
BString
objects and for making its manipulation
easy and efficient.
Similarly, the DNAStringSet (or RNAStringSet, or AAStringSet) class is
a container for storing a set of DNAString
(or RNAString
, or AAString
) objects.
All those containers derive directly (and with no additional slots) from the XStringSet virtual class.
## Constructors: BStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE) DNAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE) RNAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE) AAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE) ## Accessor-like methods: ## S4 method for signature 'character': width(x) ## S4 method for signature 'XStringSet': nchar(x, type="chars", allowNA=FALSE) ## ... and more (see below)
x |
Either a character vector (with no NAs), or an XString, XStringSet or XStringViews object. |
start,end,width |
Either NA , a single integer, or an integer vector of the same
length as x specifying how x should be "narrowed"
(see ?narrow for the details).
|
use.names |
TRUE or FALSE . Should names be preserved?
|
type,allowNA |
Ignored. |
The BStringSet
, DNAStringSet
, RNAStringSet
and
AAStringSet
functions are constructors that can be used to
"naturally" turn x
into an XStringSet object of the desired
base type.
They also allow the user to "narrow" the sequences contained in x
via proper use of the start
, end
and/or width
arguments. In this context, "narrowing" means dropping a prefix or/and
a suffix of each sequence in x
.
The "narrowing" capabilities of these constructors can be illustrated
by the following property: if x
is a character vector
(with no NAs), or an XStringSet (or XStringViews) object,
then the 3 following transformations are equivalent:
BStringSet(x, start=mystart, end=myend, width=mywidth)
subseq(BStringSet(x), start=mystart, end=myend, width=mywidth)
BStringSet(subseq(x, start=mystart, end=myend, width=mywidth))
In the code snippets below,
x
is an XStringSet object.
length(x)
:
The number of sequences in x
.
width(x)
:
A vector of non-negative integers containing the number
of letters for each element in x
.
Note that width(x)
is also defined for a character vector
with no NAs and is equivalent to nchar(x, type="bytes")
.
names(x)
:
NULL
or a character vector of the same length as x
containing
a short user-provided description or comment for each element in x
.
These are the only data in an XStringSet object that can safely
be changed by the user. All the other data are immutable!
As a general recommendation, the user should never try to modify
an object by accessing its slots directly.
alphabet(x)
:
Return NULL
, DNA_ALPHABET
, RNA_ALPHABET
or
AA_ALPHABET
depending on whether x
is a BStringSet,
DNAStringSet, RNAStringSet or AAStringSet object.
nchar(x)
:
The same as width(x)
.
In the code snippets below,
x
is a character vector (with no NAs),
or an XStringSet (or XStringViews) object.
subseq(x, start=NA, end=NA, width=NA)
:
Applies subseq
on each element in x
.
See ?subseq
for the details.
Note that this is similar to what substr
does on a
character vector. However there are some noticeable differences:
(1) the arguments are start
and stop
for
substr
;
(2) the SEW interface (start/end/width) interface of subseq
is richer (e.g. support for negative start or end values);
and (3) subseq
checks that the specified start/end/width values
are valid i.e., unlike substr
, it throws an error if
they define "out of limits" subsequences or subsequences with a
negative width.
narrow(x, start=NA, end=NA, width=NA, use.names=TRUE)
:
Same as subseq
. The only differences are: (1) narrow
has a use.names
argument; and (2) all the things narrow
and subseq
work on
(IRanges, XStringSet or
XStringViews objects for narrow
,
XVector or XStringSet objects for
subseq
). But they both work and do the same thing on an
XStringSet object.
threebands(x, start=NA, end=NA, width=NA)
:
Like the method for IRanges
objects, the
threebands
methods for character vectors and XStringSet
objects extend the capability of narrow
by returning the 3
set of subsequences (the left, middle and right subsequences)
associated to the narrowing operation.
See ?threebands
in the
IRanges package for the details.
subseq(x, start=NA, end=NA, width=NA) <- value
:
A vectorized version of the subseq<-
method for XVector objects.
See ?`subseq<-`
for the details.
In the code snippets below,
x
is an XStringSet object.
compact(x, basetype=NULL)
:
Makes a deep copy of x
that reduces its memory footprint.
Typically used before saving x
to a file (serialization).
In the code snippets below,
x
and values
are XStringSet objects,
and i
should be an index specifying the elements to extract.
x[i]
:
Return a new XStringSet object made of the selected elements.
x[[i]]
:
Extract the i-th XString
object from x
.
append(x, values, after=length(x))
:
Add sequences in values
to x
.
In the code snippets below,
x
is an XStringSet object.
is.unsorted(x, strictly=FALSE)
:
Return a logical values specifying if x
is unsorted. The
strictly
argument takes logical value indicating if the check
should be for _strictly_ increasing values.
order(x)
:
Return a permutation which rearranges x
into ascending or
descending order.
sort(x)
:
Sort x
into ascending order (equivalent to x[order(x)]
).
rank(x)
:
Rank x
in ascending order.
In the code snippets below,
x
is an XStringSet object.
duplicated(x)
:
Return a logical vector whose elements denotes duplicates in x
.
unique(x)
:
Return an XStringSet containing the unique values in x
.
In the code snippets below,
x
and y
are XStringSet objects
union(x, y)
:
Union of x
and y
.
intersect(x, y)
:
Intersection of x
and y
.
setdiff(x, y)
:
Asymmetric set difference of x
and y
.
setequal(x, y)
:
Set equality of x
to y
.
In the code snippets below,
x
is a character vector, XString, or XStringSet object and
table
is an XStringSet object.
x %in% table
:
Returns a logical vector indicating which elements in x
match
identically with an element in table
.
match(x, table, nomatch = NA_integer_, incomparables = NULL)
:
Returns an integer vector containing the first positions of an identical
match in table
for the elements in x
.
In the code snippets below,
x
is an XStringSet object.
unlist(x)
:
Turns x
into an XString object by combining the
sequences in x
together.
Fast equivalent to do.call(c, as.list(x))
.
as.character(x, use.names)
:
Convert x
to a character vector of the same length as x
.
use.names
controls whether or not names(x)
should be
used to set the names of the returned vector (default is TRUE
).
as.matrix(x, use.names)
:
Return a character matrix containing the "exploded" representation of
the strings. This can only be used on an XStringSet object with
equal-width strings.
use.names
controls whether or not names(x)
should be used
to set the row names of the returned matrix (default is TRUE
).
toString(x)
:
Equivalent to toString(as.character(x))
.
H. Pages
XString-class,
XStringViews-class,
XStringSetList-class,
substr
,
subseq
,
narrow
## --------------------------------------------------------------------- ## A. USING THE XStringSet CONSTRUCTORS ON A CHARACTER VECTOR ## --------------------------------------------------------------------- ## Note that there is no XStringSet() constructor, but an XStringSet ## family of constructors: BStringSet(), DNAStringSet(), RNAStringSet(), ## etc... x0 <- c("#CTC-NACCAGTAT", "#TTGA", "TACCTAGAG") width(x0) x1 <- BStringSet(x0) x1 ## 3 equivalent ways to obtain the same BStringSet object: BStringSet(x0, start=4, end=-3) subseq(x1, start=4, end=-3) BStringSet(subseq(x0, start=4, end=-3)) dna0 <- DNAStringSet(x0, start=4, end=-3) dna0 names(dna0) names(dna0)[2] <- "seqB" dna0 ## --------------------------------------------------------------------- ## B. USING THE XStringSet CONSTRUCTORS ON AN XStringSet OBJECT ## --------------------------------------------------------------------- library(drosophila2probe) probes <- DNAStringSet(drosophila2probe) probes RNAStringSet(probes, start=2, end=-5) # does NOT copy the sequence data! ## --------------------------------------------------------------------- ## C. USING subseq() ON AN XStringSet OBJECT ## --------------------------------------------------------------------- subseq(probes, start=2, end=-5) subseq(probes, start=13, end=13) <- "N" probes ## Add/remove a prefix: subseq(probes, start=1, end=0) <- "--" probes subseq(probes, end=2) <- "" probes ## Do more complicated things: subseq(probes, start=4:7, end=7) <- c("YYYY", "YYY", "YY", "Y") subseq(probes, start=4, end=6) <- subseq(probes, start=-2:-5) probes ## --------------------------------------------------------------------- ## D. COMPACTING AN XStringSet OBJECT ## --------------------------------------------------------------------- ## Compacting is done typically before serialization. library(drosophila2probe) probes <- DNAStringSet(drosophila2probe) object.size(probes) y1 <- subseq(probes[1:12], start=5) object.size(y1) file1 <- file.path(tempdir(), "y1.rda") save(y1, file=file1) file.info(file1)$size y2 <- compact(y1) object.size(y2) # much smaller! file2 <- file.path(tempdir(), "y2.rda") save(y2, file=file2) file.info(file2)$size ## --------------------------------------------------------------------- ## E. UNLISTING AN XStringSet OBJECT ## --------------------------------------------------------------------- library(drosophila2probe) probes <- DNAStringSet(drosophila2probe) unlist(probes)