BamFile {Rsamtools} | R Documentation |
Use BamFile()
to create a reference to a BAM file (and
optionally its index). The reference remains open across calls to
methods, avoiding costly index re-loading.
BamFileList()
provides a convenient way of managing a list of
BamFile
instances.
## Constructors BamFile(file, index=file, ..., yieldSize=NA_integer_, obeyQname=FALSE) BamFileList(..., yieldSize=NA_integer_, obeyQname=FALSE) ## Opening / closing ## S3 method for class 'BamFile' open(con, ...) ## S3 method for class 'BamFile' close(con, ...) ## accessors; also path(), index(), yieldSize(), obeyQname() ## S4 method for signature 'BamFile' isOpen(con, rw="") ## actions ## S4 method for signature 'BamFile' scanBamHeader(files, ...) ## S4 method for signature 'BamFile' seqinfo(x) ## S4 method for signature 'BamFile' scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat())) ## S4 method for signature 'BamFile' countBam(file, index=file, ..., param=ScanBamParam()) ## S4 method for signature 'BamFileList' countBam(file, index=file, ..., param=ScanBamParam()) ## S4 method for signature 'BamFile' filterBam(file, destination, index=file, ..., indexDestination=TRUE, param=ScanBamParam(what=scanBamWhat())) ## S4 method for signature 'BamFile' indexBam(files, ...) ## S4 method for signature 'BamFile' sortBam(file, destination, ..., byQname=FALSE, maxMemory=512) ## S4 method for signature 'BamFileList' mergeBam(files, destination, ...) ## S4 method for signature 'BamFile' readBamGappedAlignments(file, index=file, ..., use.names=FALSE, param=NULL) ## S4 method for signature 'BamFile' readBamGappedReads(file, index=file, use.names=FALSE, param=NULL) ## S4 method for signature 'BamFile' readBamGappedAlignmentPairs(file, index=file, use.names=FALSE, param=NULL) ## S4 method for signature 'BamFile' readBamGAlignmentsList(file, index=file, ..., use.names=FALSE, param=ScanBamParam(), asProperPairs=TRUE) ## counting ## S4 method for signature 'GRanges,BamFileList' summarizeOverlaps(features, reads, mode, ignore.strand=FALSE, ..., singleEnd=TRUE, param=ScanBamParam()) ## S4 method for signature 'GRangesList,BamFileList' summarizeOverlaps(features, reads, mode, ignore.strand=FALSE, ..., singleEnd=TRUE, param=ScanBamParam()) ## S4 method for signature 'character,ANY' findSpliceOverlaps(query, subject, ignore.strand=FALSE, ..., param=ScanBamParam(), pairedEnd=FALSE) ## S4 method for signature 'BamFile,ANY' findSpliceOverlaps(query, subject, ignore.strand=FALSE, ..., param=ScanBamParam(), pairedEnd=FALSE) ## S4 method for signature 'BamFile' coverage(x, shift=0L, width=NULL, weight=1L, ..., param = ScanBamParam()) ## S4 method for signature 'BamFile' quickCountBam(file, ..., param=ScanBamParam(), mainGroupsOnly=FALSE)
... |
Additional arguments. For |
con |
An instance of |
x, file, files |
A character vector of BAM file paths (for
|
index |
character(1); the BAM index file path (for
|
yieldSize |
Number of records to yield each time the file is read
from using |
destination |
character(1) file path to write filtered reads to. |
indexDestination |
logical(1) indicating whether the destination file should also be indexed. |
byQname, maxMemory |
See |
obeyQname |
A logical(1) indicating whether the file is
sorted by |
param |
An optional |
use.names |
Construct the names of the returned object from the query template names (QNAME field)? If not (the default), then the returned object has no names. |
rw |
Mode of file; ignored. |
reads |
A |
features |
A GRanges or a GRangesList object of genomic regions of
interest. When a GRanges is supplied, each row is considered a
feature. When a GRangesList is supplied, each higher list-level is
considered a feature. This distinction is important when defining an overlap
between a read and a feature. See ? |
mode |
A function that defines the method to be used when a read overlaps more than one feature. Pre-defined options are "Union", "IntersectionStrict", or "IntersectionNotEmpty" and are designed after the counting modes available in the HTSeq package by Simon Anders (see references).
|
ignore.strand |
A logical value indicating if strand should be considered when matching. |
singleEnd |
A logical value indicating if reads are single or paired-end. |
pairedEnd |
A logical value indicating if reads are single or paired-end. |
query |
Paired-end reads can be supplied in a Bam file or GappedAlignmentPairs object. Single-end may be in a Bam file, GappedAlignments or GRanges object. |
subject |
A TranscriptDb, or GRangesList containing the annotations. |
shift, width, weight |
See |
mainGroupsOnly |
See |
asProperPairs |
A logical indicating if the records should be filtered
such that only proper pairs are returned. Applies to
|
Objects are created by calls of the form BamFile()
.
The BamFile
class inherits fields from the
RsamtoolsFile
class.
BamFileList
inherits methods from
RsamtoolsFileList
and SimpleList
.
Opening / closing:
Opens the (local or remote) path
and
index
(if bamIndex
is not character(0)
),
files. Returns a BamFile
instance.
Closes the BamFile
con
; returning
(invisibly) the updated BamFile
. The instance may be
re-opened with open.BamFile
.
Accessors:
Returns a character(1) vector of BAM path names.
Returns a character(1) vector of BAM index path names.
Return or set an integer(1) vector indicating yield size.
Return or set a logical(0) indicating if the file was sorted by qname.
Methods:
Visit the path in path(file)
, returning
the information contained in the file header; see
scanBamHeader
.
Visit the path in path(file)
, returning
a Seqinfo
instance containing information on
the lengths of each sequence.
Visit the path in path(file)
, returning the
result of scanBam
applied to the specified path.
Visit the path(s) in path(file)
, returning
the result of countBam
applied to the specified
path.
Visit the path in path(file)
, returning
the result of filterBam
applied to the specified
path.
Visit the path in path(file)
, returning
the result of indexBam
applied to the specified
path.
Visit the path in path(file)
, returning the
result of sortBam
applied to the specified path.
Merge several BAM files into a single BAM file. See
mergeBam
for details; additional arguments supported
by mergeBam,character-method
are also available for
BamFileList
.
Visit the path in path(file)
, returning the result of
readBamGappedAlignments
, readBamGappedReads
,
or readBamGappedAlignmentPairs
applied to the specified path.
See readBamGappedAlignments
.
Visit the Bam file in path(file)
. The file must be sorted
by qname, see ?sortBam
. When a yieldSize
is set on
the BamFile data are read in chunks. To read the complete file a
while
or similar loop construct must be used. When
asProperPairs=TRUE
only proper pairs are returned.
See the ?GappedAlignmentsPairs
man page for details of the
proper pairs filtering.
The return value from readBamGAlignmentList
is a
GAlignmentsList
where each list element contains all records
of the same id (QNAME in SAM/BAM file). When asProperPairs
is
TRUE
each list element has exactly 2 records; these are the
same data as that returned from readBamGappedAlignmentPairs
, only
the return class is different. When asProperPairs
is FALSE
,
no QC is performed resulting in 1 or more records per element. List
elements containing singletons, unpaired reads or single fragments have
a length of 1 while paired-end reads or those with multiple fragments
have a length of 2 or greater.
(NOTE: asProperPairs=TRUE not yet implemented)
Compactly display the object.
Martin Morgan and Marc Carlson
The GenomicRanges
package is where the summarizeOverlaps
method originates.
fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE) length(scanBam(fl)[[1]][[1]]) # all records bf <- open(BamFile(fl)) # implicit index bf identical(scanBam(bf), scanBam(fl)) close(bf) ## chunks of size 1000 bf <- open(BamFile(fl, yieldSize=1000)) while (nrec <- length(scanBam(bf)[[1]][[1]])) cat("records:", nrec, "\n") close(bf) rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584))) ## repeatedly visit 'bf' bf <- open(BamFile(fl)) sapply(seq_len(length(rng)), function(i, bamFile, rng) { param <- ScanBamParam(which=rng[i], what="seq") bam <- scanBam(bamFile, param=param)[[1]] alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE) }, bf, rng) close(bf) ##------------------------------------------------------------------------ ## summarizeOverlaps with BamFileList ## library(pasillaBamSubset) library("TxDb.Dmelanogaster.UCSC.dm3.ensGene") exbygene <- exonsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") ## single-end: ## When 'yieldSize' is specified the file is processed by chunks. ## Otherwise the complete file is read into memory. fl <- untreated1_chr4() bfl <- BamFileList(fl, yieldSize=50000) se1 <- summarizeOverlaps(exbygene, bfl, singleEnd=TRUE) counts1 <- assays(se1)$counts ## paired-end sorted by qname: ## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname ## can be read in chunks with 'yieldSize'. fl <- untreated3_chr4() sortfl <- sortBam(fl, tempfile(), byQname=TRUE) bf2 <- BamFileList(sortfl, index=character(0), yieldSize=50000, obeyQname=TRUE) se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE) counts2 <- assays(se2)$counts ## paired-end not sorted: ## If the file is not sorted by qname, all records are read ## into memory for sorting and to determine proper pairs. ## Any 'yieldSize' set on the BamFile will be ignored. fl <- untreated3_chr4() bf3 <- BamFileList(fl) se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE) counts3 <- assays(se3)$counts identical(as.vector(counts2), as.vector(counts3)) ##------------------------------------------------------------------------ ## findSpliceOverlaps ## ## See ?'findSpliceOverlaps' for examples