SNPlocs-class {BSgenome} | R Documentation |
The SNPlocs class is a container for storing known SNP locations (of class snp) for a given organism.
SNPlocs objects are usually made in advance by a volunteer and made
available to the Bioconductor community as SNPlocs data packages.
See ?available.SNPs
for how to get the list of
SNPlocs and XtraSNPlocs data packages curently available.
The main focus of this man page is on how to extract SNPs from an SNPlocs object.
snpcount(x) snpsBySeqname(x, seqnames, ...) ## S4 method for signature 'SNPlocs' snpsBySeqname(x, seqnames, drop.rs.prefix=FALSE) snpsByOverlaps(x, ranges, ...) ## S4 method for signature 'SNPlocs' snpsByOverlaps(x, ranges, drop.rs.prefix=FALSE, ...) snpsById(x, ids, ...) ## S4 method for signature 'SNPlocs' snpsById(x, ids, ifnotfound=c("error", "warning", "drop"))
x |
A SNPlocs object. |
seqnames |
The names of the sequences for which to get SNPs. Must be a subset of
|
... |
Additional arguments, for use in specific methods. Arguments passed to the |
drop.rs.prefix |
Should the |
ranges |
One or more genomic regions of interest specified as a
GRanges or GPos object.
A single region of interest can be specified as a character string of
the form |
ids |
The RefSNP ids to look up (a.k.a. rs ids). Can be integer or character
vector, with or without the |
ifnotfound |
What to do if SNP ids are not found. |
snpcount
returns a named integer vector containing the number
of SNPs for each sequence in the reference genome.
snpsBySeqname
, snpsByOverlaps
, and snpsById
return
an unstranded GPos object with 1 element
(genomic position) per SNP and the following metadata columns:
RefSNP_id
: RefSNP ID (aka "rs id"). Character vector
with no NAs and no duplicates.
alleles_as_ambig
: A character vector with no NAs
containing the alleles for each SNP represented by an IUPAC
nucleotide ambiguity code.
See ?IUPAC_CODE_MAP
in the
Biostrings package for more information.
The alleles are always reported with respect to the positive
strand.
Note that this GPos object is unstranded i.e.
all the SNPs in it have their strand set to "*"
.
If ifnotfound="error"
, the object returned by snpsById
is guaranteed to be parallel to ids
, that is, the i-th
element in the GPos object corresponds to the
i-th element in ids
.
H. Pagès
XtraSNPlocs packages and objects for molecular variations of class other than snp e.g. of class in-del, heterozygous, microsatellite, etc...
IRanges::subsetByOverlaps
in the
IRanges package and
GenomicRanges::subsetByOverlaps
in the GenomicRanges package for more information about the
subsetByOverlaps()
generic and its method for
GenomicRanges objects.
IUPAC_CODE_MAP
in the Biostrings
package.
library(SNPlocs.Hsapiens.dbSNP144.GRCh38) snps <- SNPlocs.Hsapiens.dbSNP144.GRCh38 snpcount(snps) ## --------------------------------------------------------------------- ## snpsBySeqname() ## --------------------------------------------------------------------- ## Get all SNPs located on chromosome 22 or MT: snpsBySeqname(snps, c("22", "MT")) ## --------------------------------------------------------------------- ## snpsByOverlaps() ## --------------------------------------------------------------------- ## Get all SNPs overlapping some genomic region of interest: snpsByOverlaps(snps, "22:33.63e6-33.64e6") ## With the regions of interest being all the known CDS for hg38 ## located on chromosome 22 or MT (except for the chromosome naming ## convention, hg38 is the same as GRCh38): library(TxDb.Hsapiens.UCSC.hg38.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene my_cds <- cds(txdb) seqlevels(my_cds, pruning.mode="coarse") <- c("chr22", "chrM") seqlevelsStyle(my_cds) # UCSC seqlevelsStyle(snps) # NCBI seqlevelsStyle(my_cds) <- seqlevelsStyle(snps) genome(my_cds) <- genome(snps) my_snps <- snpsByOverlaps(snps, my_cds) my_snps table(my_snps %within% my_cds) ## --------------------------------------------------------------------- ## snpsById() ## --------------------------------------------------------------------- ## Lookup some RefSNP ids: my_rsids <- c("rs10458597", "rs12565286", "rs7553394") ## Not run: snpsById(snps, my_rsids) # error, rs7553394 not found ## End(Not run) ## The following example uses more than 2GB of memory, which is more ## than what 32-bit Windows can handle: is_32bit_windows <- .Platform$OS.type == "windows" && .Platform$r_arch == "i386" if (!is_32bit_windows) { snpsById(snps, my_rsids, ifnotfound="drop") }