trimLRPatterns {Biostrings}R Documentation

Trim Flanking Patterns from Sequences

Description

The trimLRPatterns function trims left and/or right flanking patterns from sequences.

Usage

  trimLRPatterns(Lpattern = "", Rpattern = "", subject,
                 max.Lmismatch = 0, max.Rmismatch = 0,
                 with.Lindels = FALSE, with.Rindels = FALSE,
                 Lfixed = TRUE, Rfixed = TRUE, ranges = FALSE)

Arguments

Lpattern The left part of the pattern.
Rpattern The right part of the pattern.
subject An XString or XStringSet object containing the target sequence(s).
max.Lmismatch Either an integer vector of length nLp = nchar(Lpattern) whose elements max.Lmismatch[i] represent the maximum number of acceptable mismatching letters when aligning substring(Lpattern, nLp - i + 1, nLp) with substring(subject, 1, i) or a single numeric value in (0, 1) that represents a constant maximum mismatch rate for each of the nL alignments. Negative numbers in integer vector inputs are used to prevent trimming at the i-th location. If an integer vector input has length(max.Lmismatch) < nLp, then max.Lmismatch will be augmented with enough -1's at the beginning of the vector to bring it up to length nLp.
If non-zero, an inexact matching algorithm is used (see the matchPattern function for more information).
max.Rmismatch Either an integer vector of length nRp = nchar(Rpattern) whose elements max.Rmismatch[i] represent the maximum number of acceptable mismatching letters when aligning substring(Rpattern, nRp - i + 1, nRp) with substring(subject, 1, i) or a single numeric value in (0, 1) that represents a constant maximum mismatch rate for each of the nR alignments. Negative numbers in integer vector inputs are used to prevent trimming at the i-th location. If an integer vector input has length(max.Rmismatch) < nRp, then max.Rmismatch will be augmented with enough -1's at the beginning of the vector to bring it up to length nRp.
If non-zero, an inexact matching algorithm is used (see the matchPattern function for more information).
with.Lindels If TRUE then indels are allowed in the left part of the pattern. In that case max.Lmismatch is interpreted as the maximum "edit distance" allowed in the left part of the pattern.
See the with.indels argument of the matchPattern function for more information.
with.Rindels Same as with.Lindels but for the right part of the pattern.
Lfixed Only with a DNAString or RNAString subject can a Lfixed value other than the default (TRUE) be used.
With Lfixed=FALSE, ambiguities (i.e. letters from the IUPAC Extended Genetic Alphabet (see IUPAC_CODE_MAP) that are not from the base alphabet) in the left pattern _and_ in the subject are interpreted as wildcards i.e. they match any letter that they stand for.
See the fixed argument of the matchPattern function for more information.
Rfixed Same as Lfixed but for the right part of the pattern.
ranges If TRUE, then return the ranges to use to trim subject. If FALSE, then returned the trimmed subject.

Value

A new XString or XStringSet object with the flanking patterns within the specified edit distances removed.

Author(s)

P. Aboyoun

See Also

matchPattern, matchLRPatterns, match-utils, XString-class, XStringSet-class

Examples

  Lpattern <- "TTCTGCTTG"
  Rpattern <- "GATCGGAAG"
  subject <- DNAString("TTCTGCTTGACGTGATCGGA")
  subjectSet <- DNAStringSet(c("TGCTTGACGGCAGATCGG", "TTCTGCTTGGATCGGAAG"))

  ## Only allow for perfect matches on the flanks
  trimLRPatterns(Lpattern = Lpattern, subject = subject)
  trimLRPatterns(Rpattern = Rpattern, subject = subject)
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet)

  ## Allow for perfect matches on the flanking overlaps
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                 max.Lmismatch = rep(0, 9), max.Rmismatch = rep(0, 9))

  ## Allow for mismatches on the flanks
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject,
                 max.Lmismatch = 0.2, max.Rmismatch = 0.2)
  maxMismatches <- as.integer(0.2 * 1:9)
  maxMismatches
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                 max.Lmismatch = maxMismatches, max.Rmismatch = maxMismatches)

  ## Produce ranges that can be an input into other functions
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                 max.Lmismatch = rep(0, 9), max.Rmismatch = rep(0, 9),
                 ranges = TRUE)
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject,
                 max.Lmismatch = 0.2, max.Rmismatch = 0.2, ranges = TRUE)

[Package Biostrings version 2.12.9 Index]