BamFile {Rsamtools}R Documentation

Maintain and use BAM files

Description

Use BamFile() to create a reference to a BAM file (and optionally its index). The reference remains open across calls to methods, avoiding costly index re-loading.

BamFileList() provides a convenient way of managing a list of BamFile instances.

Usage


## Constructors

BamFile(file, index=file, ..., yieldSize=NA_integer_, obeyQname=FALSE)
BamFileList(..., yieldSize=NA_integer_, obeyQname=FALSE)

## Opening / closing

## S3 method for class 'BamFile'
open(con, ...)
## S3 method for class 'BamFile'
close(con, ...)

## accessors; also path(), index(), yieldSize(), obeyQname()

## S4 method for signature 'BamFile'
isOpen(con, rw="")

## actions

## S4 method for signature 'BamFile'
scanBamHeader(files, ...)
## S4 method for signature 'BamFile'
seqinfo(x)
## S4 method for signature 'BamFile'
scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFileList'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFile'
filterBam(file, destination, index=file, ...,
    indexDestination=TRUE, param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
indexBam(files, ...)
## S4 method for signature 'BamFile'
sortBam(file, destination, ..., byQname=FALSE, maxMemory=512)
## S4 method for signature 'BamFileList'
mergeBam(files, destination, ...)
## S4 method for signature 'BamFile'
readBamGappedAlignments(file, index=file, ..., use.names=FALSE, param=NULL)
## S4 method for signature 'BamFile'
readBamGappedReads(file, index=file, use.names=FALSE, param=NULL)
## S4 method for signature 'BamFile'
readBamGappedAlignmentPairs(file, index=file, use.names=FALSE, param=NULL)
## S4 method for signature 'BamFile'
readBamGAlignmentsList(file, index=file, ...,
use.names=FALSE, param=ScanBamParam(), asProperPairs=TRUE)

## counting

## S4 method for signature 'GRanges,BamFileList'
summarizeOverlaps(features, reads, mode, ignore.strand=FALSE, ...,
    singleEnd=TRUE, param=ScanBamParam()) 
## S4 method for signature 'GRangesList,BamFileList'
summarizeOverlaps(features, reads, mode, ignore.strand=FALSE, ...,
    singleEnd=TRUE, param=ScanBamParam()) 

## S4 method for signature 'character,ANY'
findSpliceOverlaps(query, subject, ignore.strand=FALSE, ...,
    param=ScanBamParam(), pairedEnd=FALSE)
## S4 method for signature 'BamFile,ANY'
findSpliceOverlaps(query, subject, ignore.strand=FALSE, ...,
    param=ScanBamParam(), pairedEnd=FALSE)

## S4 method for signature 'BamFile'
coverage(x, shift=0L, width=NULL, weight=1L, ..., param = ScanBamParam())

## S4 method for signature 'BamFile'
quickCountBam(file, ..., param=ScanBamParam(), mainGroupsOnly=FALSE)

Arguments

...

Additional arguments. For BamFileList, this can either be a single character vector of paths to BAM files, or several instances of BamFile objects. When a character vector of paths, a second named argument ‘index’ can be a character() vector of length equal to the first argument specifying the paths to the index files, or character() to indicate that no index file is available. See BamFile. For coverage, the arguments are passed to the coverage method for GappedAlignments objects.

con

An instance of BamFile.

x, file, files

A character vector of BAM file paths (for BamFile) or a BamFile instance (for other methods).

index

character(1); the BAM index file path (for BamFile); ignored for all other methods on this page.

yieldSize

Number of records to yield each time the file is read from using scanBam. Only valid when length(bamWhich(param)) == 0. yieldSize does not alter existing yield sizes, include NA, when creating a BamFileList from BamFile instances.

destination

character(1) file path to write filtered reads to.

indexDestination

logical(1) indicating whether the destination file should also be indexed.

byQname, maxMemory

See sortBam.

obeyQname

A logical(1) indicating whether the file is sorted by qname.

param

An optional ScanBamParam instance to further influence scanning, counting, or filtering.

use.names

Construct the names of the returned object from the query template names (QNAME field)? If not (the default), then the returned object has no names.

rw

Mode of file; ignored.

reads

A BamFileList that represents the data to be counted by summarizeOverlaps.

features

A GRanges or a GRangesList object of genomic regions of interest. When a GRanges is supplied, each row is considered a feature. When a GRangesList is supplied, each higher list-level is considered a feature. This distinction is important when defining an overlap between a read and a feature. See ?summarizeOverlaps for details.

mode

A function that defines the method to be used when a read overlaps more than one feature. Pre-defined options are "Union", "IntersectionStrict", or "IntersectionNotEmpty" and are designed after the counting modes available in the HTSeq package by Simon Anders (see references).

  • "Union" : (Default) Reads that overlap any portion of exactly one feature are counted. Reads that overlap multiple features are discarded.

  • "IntersectionStrict" : A read must fall completely "within" the feature to be counted. If a read overlaps multiple features but falls "within" only one, the read is counted for that feature. If the read is "within" multiple features, the read is discarded.

  • "IntersectionNotEmpty" : A read must fall in a unique disjoint region of a feature to be counted. When a read overlaps multiple features, the features are partitioned into disjoint intervals. Regions that are shared between the features are discarded leaving only the unique disjoint regions. If the read overlaps one of these remaining regions, it is assigned to the feature the unique disjoint region came from.

ignore.strand

A logical value indicating if strand should be considered when matching.

singleEnd

A logical value indicating if reads are single or paired-end.

pairedEnd

A logical value indicating if reads are single or paired-end.

query

character name of a Bam file, a BamFile, GappedAlignments, GappedAlignmentPairs or a GRangesList object containing the reads.

Paired-end reads can be supplied in a Bam file or GappedAlignmentPairs object. Single-end may be in a Bam file, GappedAlignments or GRanges object.

subject

A TranscriptDb, or GRangesList containing the annotations.

shift, width, weight

See coverage.

mainGroupsOnly

See quickCountBam.

asProperPairs

A logical indicating if the records should be filtered such that only proper pairs are returned. Applies to readBamGAlignments only. If filtering is applied, the records returned are the same as from readBamGappedAlignmentPairs except they are in a GAlignmentsList instead of a GappedAlignmentPairs obejct.

Objects from the Class

Objects are created by calls of the form BamFile().

Fields

The BamFile class inherits fields from the RsamtoolsFile class.

Functions and methods

BamFileList inherits methods from RsamtoolsFileList and SimpleList.

Opening / closing:

open.BamFile

Opens the (local or remote) path and index (if bamIndex is not character(0)), files. Returns a BamFile instance.

close.BamFile

Closes the BamFile con; returning (invisibly) the updated BamFile. The instance may be re-opened with open.BamFile.

Accessors:

path

Returns a character(1) vector of BAM path names.

index

Returns a character(1) vector of BAM index path names.

yieldSize, yieldSize<-

Return or set an integer(1) vector indicating yield size.

obeyQname, obeyQname<-

Return or set a logical(0) indicating if the file was sorted by qname.

Methods:

scanBamHeader

Visit the path in path(file), returning the information contained in the file header; see scanBamHeader.

seqinfo

Visit the path in path(file), returning a Seqinfo instance containing information on the lengths of each sequence.

scanBam

Visit the path in path(file), returning the result of scanBam applied to the specified path.

countBam

Visit the path(s) in path(file), returning the result of countBam applied to the specified path.

filterBam

Visit the path in path(file), returning the result of filterBam applied to the specified path.

indexBam

Visit the path in path(file), returning the result of indexBam applied to the specified path.

sortBam

Visit the path in path(file), returning the result of sortBam applied to the specified path.

mergeBam

Merge several BAM files into a single BAM file. See mergeBam for details; additional arguments supported by mergeBam,character-method are also available for BamFileList.

readBamGappedAlignments, readBamGappedReads, readBamGappedAlignmentPairs

Visit the path in path(file), returning the result of readBamGappedAlignments, readBamGappedReads, or readBamGappedAlignmentPairs applied to the specified path. See readBamGappedAlignments.

readBamGAlignmentsList

Visit the Bam file in path(file). The file must be sorted by qname, see ?sortBam. When a yieldSize is set on the BamFile data are read in chunks. To read the complete file a while or similar loop construct must be used. When asProperPairs=TRUE only proper pairs are returned. See the ?GappedAlignmentsPairs man page for details of the proper pairs filtering.

The return value from readBamGAlignmentList is a GAlignmentsList where each list element contains all records of the same id (QNAME in SAM/BAM file). When asProperPairs is TRUE each list element has exactly 2 records; these are the same data as that returned from readBamGappedAlignmentPairs, only the return class is different. When asProperPairs is FALSE, no QC is performed resulting in 1 or more records per element. List elements containing singletons, unpaired reads or single fragments have a length of 1 while paired-end reads or those with multiple fragments have a length of 2 or greater. (NOTE: asProperPairs=TRUE not yet implemented)

show

Compactly display the object.

Author(s)

Martin Morgan and Marc Carlson

See Also

The GenomicRanges package is where the summarizeOverlaps method originates.

Examples


fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
                  mustWork=TRUE)
length(scanBam(fl)[[1]][[1]])  # all records

bf <- open(BamFile(fl))        # implicit index
bf
identical(scanBam(bf), scanBam(fl))
close(bf)

## chunks of size 1000
bf <- open(BamFile(fl, yieldSize=1000)) 
while (nrec <- length(scanBam(bf)[[1]][[1]]))
    cat("records:", nrec, "\n")
close(bf)

rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584)))

## repeatedly visit 'bf'
bf <- open(BamFile(fl))
sapply(seq_len(length(rng)), function(i, bamFile, rng) {
    param <- ScanBamParam(which=rng[i], what="seq")
    bam <- scanBam(bamFile, param=param)[[1]]
    alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE)
}, bf, rng)
close(bf)


##------------------------------------------------------------------------
## summarizeOverlaps with BamFileList
##

library(pasillaBamSubset)
library("TxDb.Dmelanogaster.UCSC.dm3.ensGene")
exbygene <- exonsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene")

## single-end:
## When 'yieldSize' is specified the file is processed by chunks. 
## Otherwise the complete file is read into memory.
fl <- untreated1_chr4() 
bfl <- BamFileList(fl, yieldSize=50000)
se1 <- summarizeOverlaps(exbygene, bfl, singleEnd=TRUE)
counts1 <- assays(se1)$counts

## paired-end sorted by qname:
## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname
## can be read in chunks with 'yieldSize'. 
fl <- untreated3_chr4() 
sortfl <- sortBam(fl, tempfile(), byQname=TRUE)
bf2 <- BamFileList(sortfl, index=character(0), 
                   yieldSize=50000, obeyQname=TRUE)
se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE)
counts2 <- assays(se2)$counts

## paired-end not sorted:
## If the file is not sorted by qname, all records are read 
## into memory for sorting and to determine proper pairs.
## Any 'yieldSize' set on the BamFile will be ignored.
fl <- untreated3_chr4() 
bf3 <- BamFileList(fl)
se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE)
counts3 <- assays(se3)$counts

identical(as.vector(counts2), as.vector(counts3))

##------------------------------------------------------------------------
## findSpliceOverlaps 
##

## See ?'findSpliceOverlaps' for examples 

[Package Rsamtools version 1.12.0 Index]