readBamGappedAlignments {Rsamtools} | R Documentation |
Read a BAM file into a GappedAlignments, GappedReads, GappedAlignmentPairs, or GAlignmentsList object.
readBamGappedAlignments(file, index=file, ..., use.names=FALSE, param=NULL) readBamGappedReads(file, index=file, use.names=FALSE, param=NULL) readBamGappedAlignmentPairs(file, index=file, use.names=FALSE, param=NULL) readBamGAlignmentsList(file, index=file, ..., use.names=FALSE, param=ScanBamParam(), asProperPairs=TRUE)
file, index |
The path to the BAM file to read, and to the index
file of the BAM file to read, respectively. The latter is given
without the '.bai' extension. See |
... |
Arguments passed to other methods. |
use.names |
Use the query template names (QNAME field) as the names of the returned object? If not (the default), then the returned object has no names. |
param |
By default (i.e. |
No flag is set for readBamGAlignmentsList
unless asProperPairs=TRUE
. In this case, the records read in
are the same as for readBamGappedAlignmentPairs
.
asProperPairs |
A logical indicating if the records should be filtered
such that only proper pairs are returned. Applies to
|
See ?GappedAlignments-class
for a
description of GappedAlignments objects.
See ?GappedReads-class
for a
description of GappedReads objects.
readBamGappedAlignmentPairs
proceeds in 2 steps:
Load the BAM file into a GappedAlignments
object with readBamGappedAlignments
;
Turn this GappedAlignments object into a GappedAlignmentPairs object by pairing its elements.
See ?GappedAlignmentPairs-class
for a
description of GappedAlignmentPairs objects,
and ?makeGappedAlignmentPairs
for the details of the
pairing procedure.
The return value from readBamGAlignmentList
is a
GAlignmentsList
where each list element contains all records
of the same id (QNAME in SAM/BAM file). When asProperPairs
is
TRUE
each list element has exactly 2 records; these are the
same data as that returned from readBamGappedAlignmentPairs
, only
the return class is different. When asProperPairs
is FALSE
,
no QC is performed resulting in 1 or more records per element. List
elements containing singletons, unpaired reads or single fragments have
a length of 1 while paired-end reads or those with multiple fragments
have a length of 2 or greater.
(NOTE: asProperPairs=TRUE not yet implemented)
See ?GAlignmentsList-class
for a
description of GAlignmentsList objects.
A GappedAlignments object for
readBamGappedAlignments
.
A GappedReads object for readBamGappedReads
.
A GappedAlignmentPairs object for
readBamGappedAlignmentPairs
.
Note that a BAM (or SAM) file can in theory contain a mix of single-end
and paired-end reads, but in practise it seems that single-end and
paired-end are not mixed. In other words, the value of flag bit 0x1
(isPaired
) is the same for all the records in a file.
So if readBamGappedAlignmentPairs
returns a
GappedAlignmentPairs object of length zero,
this almost certainly means that the BAM (or SAM) file contains
alignments for single-end reads (although it could also mean that the
user-supplied ScanBamParam
is filtering out everything,
or that the file is empty, or that all the records in the file correspond
to unmapped reads).
A GAlignmentsList object for
readBamGAlignmentsList
. Single or paired-end data can be
read into this structure. The list elements are groups of records
by read id.
BAM records corresponding to unmapped reads are always ignored.
Starting with Rsamtools 1.7.1 (BioC 2.10), PCR or optical duplicates
are loaded by default (use scanBamFlag(isDuplicate=FALSE)
to
drop them).
H. Pages
GappedAlignments-class,
GAlignmentsList-class,
GappedReads-class,
GappedAlignmentPairs-class,
makeGappedAlignmentPairs
,
scanBam
,
ScanBamParam
## --------------------------------------------------------------------- ## A. readBamGappedAlignments() ## --------------------------------------------------------------------- ## Simple use: bamfile <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE) gal1 <- readBamGappedAlignments(bamfile) gal1 names(gal1) ## Using the 'use.names' arg: gal2 <- readBamGappedAlignments(bamfile, use.names=TRUE) gal2 head(names(gal2)) ## Using the 'param' arg to drop PCR or optical duplicates and load ## additional BAM fields: param <- ScanBamParam(flag=scanBamFlag(isDuplicate=FALSE), what=c("qual", "flag")) gal3 <- readBamGappedAlignments(bamfile, param=param) gal3 mcols(gal3) ## Using the 'param' arg to load reads from particular regions. ## Note that if we weren't providing a 'what' argument here, all the ## BAM fields would be loaded: which <- RangesList(seq1=IRanges(1000, 2000), seq2=IRanges(c(100, 1000), c(1000, 2000))) param <- ScanBamParam(which=which) gal4 <- readBamGappedAlignments(bamfile, param=param) gal4 ## Note that a given record is loaded one time for each region it ## belongs to (this is a scanBam() feature, readBamGappedAlignments() ## is based on scanBam()): which <- IRangesList(seq2=IRanges(c(1563, 1567), width=1)) param <- ScanBamParam(which=which) gal5 <- readBamGappedAlignments(bamfile, param=param) gal5 ## Using the 'param' arg to load tags. Except for MF and Aq, the tags ## specified below are predefined tags (see the SAM Spec for the list ## of predefined tags and their meaning). param <- ScanBamParam(tag=c("MF", "Aq", "NM", "UQ", "H0", "H1"), what="isize") gal6 <- readBamGappedAlignments(bamfile, param=param) mcols(gal6) # "tag" cols always after "what" cols ## --------------------------------------------------------------------- ## B. readBamGappedReads() ## --------------------------------------------------------------------- greads1 <- readBamGappedReads(bamfile) greads1 names(greads1) qseq(greads1) greads2 <- readBamGappedReads(bamfile, use.names=TRUE) head(greads2) head(names(greads2)) ## --------------------------------------------------------------------- ## C. readBamGappedAlignmentPairs() ## --------------------------------------------------------------------- galp1 <- readBamGappedAlignmentPairs(bamfile) head(galp1) names(galp1) galp2 <- readBamGappedAlignmentPairs(bamfile, use.names=TRUE) galp2 head(galp2) head(names(galp2)) ## --------------------------------------------------------------------- ## D. readBamGAlignmentPairs() ## --------------------------------------------------------------------- ## As sample data we use the paired-end file from pasillaBamSubset. ## This method requires that Bam file to be sorted by qname. Setting ## the yieldSize is optional. library(pasillaBamSubset) bfsort <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE) bf <- BamFile(bfsort, index=character(0), yieldSize=100, obeyQname=TRUE) galist <- readBamGAlignmentsList(bf, asProperPairs=FALSE) ## List elements are grouped by id and can hold any number ## of pairs or fragments. galist table(elementLengths(galist)) ## Single reads appearing in 'galist' are filtered out by ## the readBamGappedAlignmentPairs QC process because they ## are not proper pairs. galp <- readBamGappedAlignmentPairs(bf) findOverlaps(galist[elementLengths(galist) == 1], unlist(galp), type="within") ## When 'asProperPairs' is 'TRUE', readBamGappedAlignmentPairs ## and readGAlignmentsList return the same records but in different ## classes.