yieldReduce {Rsamtools} | R Documentation |
Rsamtools files can be created with a ‘yieldSize’ argument that
influences the number of records (chunk size) input at one time (see,
e.g,. BamFile
). yieldReduce
iterates through the
file, processing each chunk and reducing it with previously input
chunks. This is a memory efficient way to process large data files,
especially when the final result fits in memory.
yieldReduce(X, MAP, REDUCE, DONE, ..., init, ITERATE = TRUE)
X |
A |
MAP |
A function of one or more arguments, |
REDUCE |
A function of one ( |
DONE |
A function of one argument, the |
... |
Additional arguments, passed to |
init |
(Optional) Initial value used for |
ITERATE |
logical(1) determining whether the call to
|
When ITERATE=TRUE
, REDUCE
is initially invoked with
either the init
value and the value of the first call to
MAP
or, if init
is missing, the values of the first two
calls to MAP
.
When ITERATE=FALSE
, REDUCE
is invoked with a list
containing a list with as many elements as there were calls to
MAP
. Each element the result of an invocation of MAP
.
The return value is the value returned by the final invocation of
REDUCE
, or init
if provided and no data were yield'ed,
or list()
if init
is missing and no data were yield'ed.
Martin Morgan mtmorgan@fhcrc.org
BamFile
, TabixFile
, RsamtoolsFile
.
fl <- system.file(package="Rsamtools", "extdata", "ex1.bam") ## nucleotide frequency of mapped reads bf <- BamFile(fl, yieldSize=500) ## typically, yieldSize=1e6 param <- ScanBamParam( flag=scanBamFlag(isUnmappedQuery=FALSE), what="seq") MAP <- function(X, param) { value <- scanBam(X, param=param)[[1]][["seq"]] if (length(value)) alphabetFrequency(value, collapse=TRUE) else value # will be integer(0) } REDUCE <- `+` # add successive alphabetFrequency matrices yieldReduce(bf, MAP, REDUCE, param=param) ## coverage if (require(GenomicAlignments)) { MAP <- function(X) coverage(readGAlignments(X)) REDUCE <- `+` DONE <- function(VALUE) ## coverage() on zero GAlignments returns an RleList, ## each element of which has 0 coverage sum(sum(VALUE)) == 0L yieldReduce(bf, MAP, REDUCE, DONE) }