Grouping-class {IRanges}R Documentation

Grouping objects

Description

In this man page, we call "grouping" the action of dividing a collection of NO objects into NG groups (some of which may be empty). The Grouping class and subclasses are containers for representing groupings.

The Grouping core API

Let's give a formal description of the Grouping core API:

Groups G_i are indexed from 1 to NG (1 <= i <= NG).

Objects O_j are indexed from 1 to NO (1 <= j <= NO).

Every object must belong to one group and only one.

Given that empty groups are allowed, NG can be greater than NO.

Grouping an empty collection of objects (NO = 0) is supported. In that case, all the groups are empty. And only in that case, NG can be zero too (meaning there are no groups).

If x is a Grouping object:

length(x): Returns the number of groups (NG).

names(x): Returns the names of the groups.

nobj(x): Returns the number of objects (NO). Equivalent to length(togroup(x)).

Going from groups to objects:

x[[i]]: Returns the indices of the objects (the j's) that belong to G_i. The j's are returned in ascending order. This provides the mapping from groups to objects (one-to-many mapping).

grouplength(x, i=NULL): Returns the number of objects in G_i. Works in a vectorized fashion (unlike x[[i]]). grouplength(x) is equivalent to grouplength(x, seq_len(length(x))). If i is not NULL, grouplength(x, i) is equivalent to sapply(i, function(ii) length(x[[ii]])).

members(x, i): Equivalent to x[[i]] if i is a single integer. Otherwise, if i is an integer vector of arbitrary length, it's equivalent to sort(unlist(sapply(i, function(ii) x[[ii]]))).

vmembers(x, L): A version of members that works in a vectorized fashion with respect to the L argument (L must be a list of integer vectors). Returns lapply(L, function(i) members(x, i)).

Going from objects to groups:

togroup(x, j=NULL): Returns the index i of the group that O_j belongs to. This provides the mapping from objects to groups (many-to-one mapping). Works in a vectorized fashion. togroup(x) is equivalent to togroup(x, seq_len(nobj(x))): both return the entire mapping in an integer vector of length NO. If j is not NULL, togroup(x, j) is equivalent to y <- togroup(x); y[j].

togrouplength(x, j=NULL): Returns the number of objects that belong to the same group as O_j (including O_j itself). Equivalent to grouplength(x, togroup(x, j)).

Given that length, names and [[ are defined for Grouping objects, those objects can be considered List objects. In particular, as.list works out-of-the-box on them.

One important property of any Grouping object x is that unlist(as.list(x)) is always a permutation of seq_len(nobj(x)). This is a direct consequence of the fact that every object in the grouping belongs to one group and only one.

The H2LGrouping and Dups subclasses

[DOCUMENT ME]

The Partitioning subclass

A Partitioning container represents a block-grouping, i.e. a grouping where each group contains objects that are neighbors in the original collection of objects. More formally, a grouping x is a block-grouping iff togroup(x) is sorted in increasing order (not necessarily strictly increasing).

A block-grouping object can also be seen (and manipulated) as a Ranges object where all the ranges are adjacent starting at 1 (i.e. it covers the 1:NO interval with no overlap between the ranges).

Note that a Partitioning object is both: a particular type of Grouping object and a particular type of Ranges object. Therefore all the methods that are defined for Grouping and Ranges objects can also be used on a Partitioning object. See ?Ranges for a description of the Ranges API.

The Partitioning class is virtual with 2 concrete subclasses: PartitioningByEnd (only stores the end of the groups, allowing fast mapping from groups to objects), and PartitioningByWidth (only stores the width of the groups).

Constructors

H2LGrouping(high2low=integer()): [DOCUMENT ME]

Dups(high2low=integer()): [DOCUMENT ME]

PartitioningByEnd(x=integer(), NG=NULL, names=NULL): x must be either a list-like object or a sorted integer vector. NG must be either NULL or a single integer. names must be either NULL or a character vector of length NG (if supplied) or length(x) (if NG is not supplied).

Returns the following PartitioningByEnd object y:

If the names argument is supplied, it is used to name the partitions.

PartitioningByWidth(x=integer(), NG=NULL, names=NULL): x must be either a list-like object or an integer vector. NG must be either NULL or a single integer. names must be either NULL or a character vector of length NG (if supplied) or length(x) (if NG is not supplied).

Returns the following PartitioningByWidth object y:

If the names argument is supplied, it is used to name the partitions.

Note that these constructors don't recycle their names argument (to remain consistent with what `names<-` does on standard vectors).

Author(s)

H. Pages

See Also

List-class, Ranges-class, IRanges-class, successiveIRanges, cumsum, diff

Examples

showClass("Grouping")  # shows (some of) the known subclasses

## ---------------------------------------------------------------------
## A. H2LGrouping OBJECTS
## ---------------------------------------------------------------------
high2low <- c(NA, NA, 2, 2, NA, NA, NA, 6, NA, 1, 2, NA, 6, NA, NA, 2)
h2l <- H2LGrouping(high2low)
h2l

## The Grouping core API:
length(h2l)
nobj(h2l)  # same as 'length(h2l)' for H2LGrouping objects
h2l[[1]]
h2l[[2]]
h2l[[3]]
h2l[[4]]
h2l[[5]]
grouplength(h2l)  # same as 'unname(sapply(h2l, length))'
grouplength(h2l, 5:2)
members(h2l, 5:2)  # all the members are put together and sorted
togroup(h2l)
togroup(h2l, 5:2)
togrouplength(h2l)  # same as 'grouplength(h2l, togroup(h2l))'
togrouplength(h2l, 5:2)

## The List API:
as.list(h2l)
sapply(h2l, length)

## ---------------------------------------------------------------------
## B. Dups OBJECTS
## ---------------------------------------------------------------------
dups1 <- as(h2l, "Dups")
dups1
duplicated(dups1)  # same as 'duplicated(togroup(dups1))'

### The purpose of a Dups object is to describe the groups of duplicated
### elements in a vector-like object:
x <- c(2, 77, 4, 4, 7, 2, 8, 8, 4, 99)
x_high2low <- high2low(x)
x_high2low  # same length as 'x'
dups2 <- Dups(x_high2low)
dups2
togroup(dups2)
duplicated(dups2)
togrouplength(dups2)  # frequency for each element
table(x)

## ---------------------------------------------------------------------
## C. Partitioning OBJECTS
## ---------------------------------------------------------------------
pbe1 <- PartitioningByEnd(c(4, 7, 7, 8, 15), names=LETTERS[1:5])
pbe1  # the 3rd partition is empty

## The Grouping core API:
length(pbe1)
nobj(pbe1)
pbe1[[1]]
pbe1[[2]]
pbe1[[3]]
grouplength(pbe1)  # same as 'unname(sapply(pbe1, length))' and 'width(pbe1)'
togroup(pbe1)
togrouplength(pbe1)  # same as 'grouplength(pbe1, togroup(pbe1))'
names(pbe1)

## The Ranges core API:
start(pbe1)
end(pbe1)
width(pbe1)

## The List API:
as.list(pbe1)
sapply(pbe1, length)

## Replacing the names:
names(pbe1)[3] <- "empty partition"
pbe1

## Coercion to an IRanges object:
as(pbe1, "IRanges")

## Other examples:
PartitioningByEnd(c(0, 0, 19), names=LETTERS[1:3])
PartitioningByEnd()  # no partition
PartitioningByEnd(integer(9))  # all partitions are empty
x <- c(1L, 5L, 5L, 6L, 8L)
pbe2 <- PartitioningByEnd(x, NG=10L)
stopifnot(identical(togroup(pbe2), x))
pbw2 <- PartitioningByWidth(x, NG=10L)
stopifnot(identical(togroup(pbw2), x))

## ---------------------------------------------------------------------
## D. RELATIONSHIP BETWEEN Partitioning OBJECTS AND successiveIRanges()
## ---------------------------------------------------------------------
mywidths <- c(4, 3, 0, 1, 7)

## The 3 following calls produce the same ranges:
ir <- successiveIRanges(mywidths)  # IRanges instance.
pbe <- PartitioningByEnd(cumsum(mywidths))  # PartitioningByEnd instance.
pbw <- PartitioningByWidth(mywidths)  # PartitioningByWidth instance.
stopifnot(identical(as(ir, "PartitioningByEnd"), pbe))
stopifnot(identical(as(ir, "PartitioningByWidth"), pbw))

[Package IRanges version 1.18.2 Index]