BEDFile-class {rtracklayer} | R Documentation |
These functions support the import and export of the UCSC BED format and its variants, including BEDGraph.
## S4 method for signature 'BEDFile,ANY,ANY' import(con, format, text, trackLine = TRUE, genome = NA, asRangedData = FALSE, colnames = NULL, which = NULL, seqinfo = NULL, extraCols = character()) import.bed(con, ...) import.bed15(con, ...) import.bedGraph(con, ...) ## S4 method for signature 'ANY,BEDFile,ANY' export(object, con, format, ...) ## S4 method for signature 'RangedData,BEDFile,ANY' export(object, con, format, append = FALSE, index = FALSE, ignore.strand = FALSE) ## S4 method for signature 'RangedDataList,BEDFile,ANY' export(object, con, format, ...) ## S4 method for signature 'UCSCData,BEDFile,ANY' export(object, con, format, trackLine = TRUE, ...) ## S4 method for signature 'RangedData,BED15File,ANY' export(object, con, format, expNames = NULL, trackLine = NULL, ...) export.bed(object, con, ...) export.bed15(object, con, ...) export.bedGraph(object, con, ...)
con |
A path, URL, connection or |
object |
The object to export, should be a |
format |
If not missing, should be one of “bed”, “bed15” or “bedGraph”. |
text |
If |
trackLine |
Whether to parse/output a UCSC track line. An
imported track line will be stored in a |
genome |
The identifier of a genome, or |
asRangedData |
If |
colnames |
A character vector naming the columns to parse. These should name columns in the result, not those in the BED spec, so e.g. specify “thick”, instead of “thickStart”. |
which |
A range data structure like |
index |
If |
ignore.strand |
Whether to output the strand when not required (by the existence of later fields). |
seqinfo |
If not |
extraCols |
Names of extra columns to read from the BED file. As BED does not encode column names, these are assumed to be the last columns in the file. This enables parsing of the various BEDX+Y formats. |
append |
If |
expNames |
character vector of column names in |
... |
Arguments to pass down to methods to other methods. For
import, the flow eventually reaches the |
The BED format is a tab-separated table of intervals, with annotations like name, score and even sub-intervals for representing alignments and gene models. Official (UCSC) child formats currently include BED15 (adding a number matrix for e.g. expression data across multiple samples) and BEDGraph (a compressed means of storing a single score variable, e.g. coverage; overlapping features are not allowed). Many tools and organizations have extended the BED format with additional columns for particular use cases. These are not yet supported by rtracklayer, but a mechanism will be added soon. The advantage of BED is its balance between simplicity and expressiveness. It is also relatively scalable, because only the first three columns (chrom, start and end) are required. Thus, BED is best suited for representing simple features. For specialized cases, one is usually better off with another format. For example, genome-scale vectors belong in BigWig, alignments from high-throughput sequencing belong in BAM, and gene models are more richly expressed in GFF.
The following is the mapping of BED elements to a GRanges
or
RangedData
object. NA values are allowed only where indicated.
These appear as a “.” in the file. Only the first three columns
(chrom, start and strand) are required. The other columns can only be
included if all previous columns (to the left) are included. Upon export,
default values are used to automatically pad the table, if necessary.
the ranges
component.
character vector (NA's allowed) in the name
column; defaults to NA on export.
numeric vector in the score
column, accessible via the score
accessor. Defaults to 0
on export. This is the only column present in BEDGraph (besides
chrom, start and end), and it is required.
strand factor (NA's allowed) in the strand
column, accessible via the strand
accessor; defaults to NA
on export.
Ranges
object in a
column named thick
; defaults to the ranges of the feature
on export.
character vector of hex color codes, as returned
by col2rgb
, in the itemRgb
column; default is
NA on export, which translates to black.
RangesList
object
in a column named blocks
; defaults to empty upon BED15 export.
These columns are present only in BED15:
A column for each unique element
in expIds
, containing the corresponding values from
expScores
. When a value is not present for a feature, NA is
substituted. NA values become -10000 in the file.
A GRanges
(or RangedData
if asRangedData
is
TRUE
), with the metadata columns described in the details.
The BEDFile
class extends RTLFile
and is a
formal represention of a resource in the BED format.
To cast a path, URL or connection to a BEDFile
, pass it to
the BEDFile
constructor. Classes and constructors also exist
for the subclasses BED15File
and BEDGraphFile
.
Michael Lawrence
http://genome.ucsc.edu/goldenPath/help/customTrack.html
test_path <- system.file("tests", package = "rtracklayer") test_bed <- file.path(test_path, "test.bed") test <- import(test_bed, asRangedData = FALSE) test rd <- import.bed(test_bed, asRangedData = TRUE) rd test_bed_file <- BEDFile(test_bed) import(test_bed_file, asRangedData = FALSE) test_bed_con <- file(test_bed) import(test_bed_con, format = "bed", asRangedData = FALSE) close(test_bed_con) import(test_bed, trackLine = FALSE, asRangedData = FALSE) import(test_bed, genome = "hg19", asRangedData = FALSE) import(test_bed, colnames = c("name", "strand", "thick"), asRangedData = FALSE) which <- RangesList(chr7 = as(test, "RangesList")[[1]][1:2]) import(test_bed, which = which, asRangedData = FALSE) ## Not run: test_bed_out <- file.path(tempdir(), "test.bed") export(test, test_bed_out) test_bed_out_file <- BEDFile(test_bed_out) export(test, test_bed_out_file) export(rd, test_bed_out, name = "Alternative name") test_bed_gz <- paste(test_bed_out, ".gz", sep = "") export(test, test_bed_gz) export(test, test_bed_out, index = TRUE) export(test, test_bed_out, index = TRUE, trackLine = FALSE) bed_text <- export(test, format = "bed") test <- import(format = "bed", text = bed_text, asRangedData = FALSE) ## End(Not run)