Ranges-comparison {IRanges}R Documentation

Comparing and ordering ranges

Description

Methods for comparing and ordering the elements in one or more Ranges objects.

Usage

## Element-wise (aka "parallel") comparison of 2 Ranges objects
## ------------------------------------------------------------

## S4 method for signature 'Ranges,Ranges'
e1 == e2

## S4 method for signature 'Ranges,Ranges'
e1 <= e2

## duplicated()
## ------------

## S4 method for signature 'Ranges'
duplicated(x, incomparables=FALSE, fromLast=FALSE,
           method=c("auto", "quick", "hash"))

## match()
## -------

## S4 method for signature 'Ranges,Ranges'
match(x, table, nomatch=NA_integer_, incomparables=NULL,
      method=c("auto", "quick", "hash"), match.if.overlap=FALSE)

## findMatches() / countMatches()
## ------------------------------

findMatches(x, table, select=c("all", "first", "last"), ...)
countMatches(x, table, ...)

## order() and related methods
## ----------------------------

## S4 method for signature 'Ranges'
order(..., na.last=TRUE, decreasing=FALSE)

## S4 method for signature 'Ranges'
rank(x, na.last=TRUE,
     ties.method=c("average", "first", "random", "max", "min"))

## Generalized element-wise (aka "parallel") comparison of 2 Ranges objects
## ------------------------------------------------------------------------

compare(x, y)
## S4 method for signature 'Ranges,Ranges'
compare(x, y)

rangeComparisonCodeToLetter(code)

Arguments

e1, e2, x, table, y

Ranges objects. For findMatches and countMatches, x and table can also be atomic vectors.

incomparables

Not supported.

fromLast

See default S3 method for duplicated.

method

Use a Quicksort-based (method="quick") or a hash-based (method="hash") algorithm. The latter tends to give better performance, except maybe for some pathological input that we've not been able to determine so far.

When method="auto" is specified, the most efficient algorithm will be used, that is, the hash-based algorithm if length(x) <= 2^29, otherwise the Quicksort-based algorithm.

nomatch

The value to be returned in the case when no match is found. It is coerced to an integer.

match.if.overlap

You temporarily need to explicitly provide this argument otherwise match() will issue a warning.

Starting with BioC 2.12, the default behavior of match() on Ranges objects has changed to use equality instead of overlap for comparing elements between Ranges objects x and table. Now x[i] and table[j] are considered to match when they are equal (i.e. x[i] == table[j]), instead of when they overlap. This new behavior is consistent with base::match().

If you need the new behavior, use match.if.overlap=FALSE.

If you need the old behavior, you can either do: findOverlaps(x, table, select="first") which is the recommended way. Alternatively, you can call match() with match.if.overlap=TRUE.

select

Not supported yet. Note that you can use match if you want to do select="first". Otherwise you're welcome to request this on the Bioconductor mailing list.

...

For findMatches, countMatches: arguments to be passed down to match e.g. to specify one of the supported methods (note that, by default, i.e. with method="auto", the most efficient algorithm will be used).

For order: additional Ranges objects used for breaking ties.

na.last

Ignored.

decreasing

TRUE or FALSE.

ties.method

A character string specifying how ties are treated. Only "first" is supported for now.

code

A vector of codes as returned by compare.

Details

Two ranges are considered equal iff they share the same start and width. Note that with this definition, 2 empty ranges are generally not equal (they need to share the same start to be considered equal). This means that, when it comes to comparing ranges, an empty range is interpreted as a position between its end and start. For example, a typical usecase is comparison of insertion points defined along a string (like a DNA sequence) and represented as empty ranges.

Ranges are ordered by starting position first, and then by width. This way, the space of ranges is totally ordered. On a Ranges object, order, sort, and rank are consistent with this order.

duplicated(x): Determines which elements of x are equal to elements with smaller subscripts, and returns a logical vector indicating which elements are duplicates. It is semantically equivalent to duplicated(as.data.frame(x)). See duplicated in the base package for more details.

unique(x): Removes duplicate ranges from x. See unique in the base package for more details.

match(x, table, nomatch=NA_integer_): Returns an integer vector of the length of x, containing the index of the first matching range in table (or nomatch if there is no matching range) for each range in x.

x %in% table: A shortcut for finding the ranges in x that match any of the ranges in table. Returns a logical vector of length equal to the number of ranges in x.

findMatches(x, table, select=c("all", "first", "last"), ...): An enhanced version of match that returns all the matches in a Hits object.

countMatches(x, table, ...): Returns an integer vector of the length of x, containing the number of matches in table for each element in x.

order(...): Returns a permutation which rearranges its first argument (a Ranges object) into ascending order, breaking ties by further arguments (also Ranges objects). See order in the BiocGenerics package for more information.

sort(x): Sorts x. See sort in the base package for more details.

rank(x, na.last=TRUE, ties.method=c("average", "first", "random", "max", "min")): Returns the sample ranks of the ranges in x. See rank in the base package for more details.

compare(x, y): Performs "generalized range-wise comparison" of x and y, that is, returns an integer vector where the i-th element is a code describing how the i-th element in x is qualitatively positioned relatively to the i-th element in y.

Here is a summary of the 13 predefined codes (and their letter equivalents) and their meanings:

      -6 a: x[i]: .oooo.......         6 m: x[i]: .......oooo.
            y[i]: .......oooo.              y[i]: .oooo.......

      -5 b: x[i]: ..oooo......         5 l: x[i]: ......oooo..
            y[i]: ......oooo..              y[i]: ..oooo......

      -4 c: x[i]: ...oooo.....         4 k: x[i]: .....oooo...
            y[i]: .....oooo...              y[i]: ...oooo.....

      -3 d: x[i]: ...oooooo...         3 j: x[i]: .....oooo...
            y[i]: .....oooo...              y[i]: ...oooooo...

      -2 e: x[i]: ..oooooooo..         2 i: x[i]: ....oooo....
            y[i]: ....oooo....              y[i]: ..oooooooo..

      -1 f: x[i]: ...oooo.....         1 h: x[i]: ...oooooo...
            y[i]: ...oooooo...              y[i]: ...oooo.....

                      0 g: x[i]: ...oooooo...
                           y[i]: ...oooooo...
      
Note that this way of comparing ranges is a refinement over the standard ranges comparison defined by the ==, !=, <=, >=, < and > operators. In particular a code that is < 0, = 0, or > 0, corresponds to x[i] < y[i], x[i] == y[i], or x[i] > y[i], respectively. The compare method for Ranges objects is guaranteed to return predefined codes only but methods for other objects (e.g. for GenomicRanges objects) can return non-predefined codes. Like for the predefined codes, the sign of any non-predefined code must tell whether x[i] is less than, or greater than y[i].

rangeComparisonCodeToLetter(x): Translate the codes returned by compare. The 13 predefined codes are translated as follow: -6 -> a; -5 -> b; -4 -> c; -3 -> d; -2 -> e; -1 -> f; 0 -> g; 1 -> h; 2 -> i; 3 -> j; 4 -> k; 5-> l; 6 -> m. Any non-predefined code is translated to X. The translated codes are returned in a factor with 14 levels: a, b, ..., l, m, X.

Author(s)

H. Pages

See Also

Examples

x <- IRanges(start=c(20L, 8L, 20L, 22L, 25L, 20L, 22L, 22L),
             width=c( 4L, 0L, 11L,  5L,  0L,  9L,  5L,  0L))
x

## ---------------------------------------------------------------------
## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 Ranges OBJECTS
## ---------------------------------------------------------------------
which(width(x) == 0)  # 3 empty ranges
x[2] == x[2]  # TRUE
x[2] == x[5]  # FALSE
x == x[4]
x >= x[3]

## ---------------------------------------------------------------------
## B. duplicated(), unique()
## ---------------------------------------------------------------------
duplicated(x)
unique(x)

## ---------------------------------------------------------------------
## C. match(), %in%
## ---------------------------------------------------------------------
table <- x[c(2:4, 7:8)]
match(x, table)  # Warning! The warning will be removed in BioC 2.13.

## In the meantime, specify 'match.if.overlap=FALSE' to suppress the warning:
match(x, table, match.if.overlap=FALSE)

x %in% table  # Warning! The warning will be removed in BioC 2.13.
## In the meantime, use suppressWarnings() to suppress the warning:
suppressWarnings(x %in% table)

## ---------------------------------------------------------------------
## D. findMatches(), countMatches()
## ---------------------------------------------------------------------
findMatches(x, table)
countMatches(x, table)

x_levels <- unique(x)
countMatches(x_levels, x)

## ---------------------------------------------------------------------
## E. order() AND RELATED METHODS
## ---------------------------------------------------------------------
order(x)
sort(x)
rank(x, ties.method="first")

## ---------------------------------------------------------------------
## F. GENERALIZED ELEMENT-WISE COMPARISON OF 2 Ranges OBJECTS
## ---------------------------------------------------------------------
x0 <- IRanges(1:11, width=4)
x0
y0 <- IRanges(6, 9)
compare(x0, y0)
compare(IRanges(4:6, width=6), y0)
compare(IRanges(6:8, width=2), y0)
compare(x0, y0) < 0   # equivalent to 'x0 < y0'
compare(x0, y0) == 0  # equivalent to 'x0 == y0'
compare(x0, y0) > 0   # equivalent to 'x0 > y0'

rangeComparisonCodeToLetter(-10:10)
rangeComparisonCodeToLetter(compare(x0, y0))

[Package IRanges version 1.18.0 Index]