tango.text.Search

License:
BSD style:

Version:
May 2009: Initial release

since:
0.99.9

author:
Kris

FindFruct!(T) find(T)(const(T)[] what);
Returns a lightweight pattern matcher, good for short patterns and/or short to medium length content. Brute-force approach with fast multi-byte comparisons

SearchFruct!(T) search(T)(const(T)[] what);
Returns a welterweight pattern matcher, good for long patterns and/or extensive content. Based on the QS algorithm which is a Boyer-Moore variant. Does not allocate memory for the alphabet.

Generally becomes faster as the match-length grows

struct FindFruct(T);
Convenient bundle of lightweight find utilities, without the hassle of IFTI problems. Create one of these using the find()

function:
auto match = find ("foo");
auto content = "wumpus foo bar"

// search in the forward direction
auto index = match.forward (content);
assert (index is 7);

// search again - returns length when no match found
assert (match.forward(content, index+1) is content.length);


Searching operates both forward and backward, with an optional start offset (can be more convenient than slicing the content). There are methods to replace matches within given content, and others which return foreach() iterators for traversing content.

SearchFruct is a more sophisticated variant, which operates more efficiently on longer matches and/or more extensive content.

size_t forward(const(T)[] content, size_t ofs = 0);
Search forward in the given content, starting at the optional index.

Returns the index of a match, or content.length where no match was located.

size_t reverse(const(T)[] content, size_t ofs = size_t.max);
Search backward in the given content, starting at the optional index.

Returns the index of a match, or content.length where no match was located.

const(T)[] match();
Return the match text

void match(const(T)[] what);
Reset the text to match

bool within(const(T)[] content);
Returns true if there is a match within the given content

size_t count(const(T)[] content);
Returns number of matches within the given content

T[] replace(const(T)[] content, T chr);
Replace all matches with the given character. Use method tokens() instead to avoid heap activity.

Returns a copy of the content with replacements made

T[] replace(const(T)[] content, const(T)[] sub = null);
Replace all matches with the given substitution. Use method tokens() instead to avoid heap activity.

Returns a copy of the content with replacements made

Util.PatternFruct!(T) tokens(const(T)[] content, const(T)[] sub = null);
Returns a foreach() iterator which exposes text segments between all matches within the given content. Substitution text is also injected in place of each match, and null can be used to indicate removal instead:
char[] result;

auto match = find ("foo");
foreach (token; match.tokens ("$foo&&foo*", "bar"))
         result ~= token;
assert (result == "$bar&&bar*");


This mechanism avoids internal heap activity.

Indices indices(const(T)[] content);
Returns a foreach() iterator which exposes the indices of all matches within the given content:
int count;

auto f = find ("foo");
foreach (index; f.indices("$foo&&foo*"))
         ++count;
assert (count is 2);


struct Indices;
Simple foreach() iterator

struct SearchFruct(T);
Convenient bundle of welterweight search utilities, without the hassle of IFTI problems. Create one of these using the search()

function:
auto match = search ("foo");
auto content = "wumpus foo bar"

// search in the forward direction
auto index = match.forward (content);
assert (index is 7);

// search again - returns length when no match found
assert (match.forward(content, index+1) is content.length);


Searching operates both forward and backward, with an optional start offset (can be more convenient than slicing the content). There are methods to replace matches within given content, and others which return foreach() iterators for traversing content.

FindFruct is a simpler variant, which can operate efficiently on short matches and/or short content (employs brute-force strategy)

SearchFruct opCall(const(T)[] what);
Construct the fruct

const(T)[] match();
Return the match text

void match(const(T)[] what);
Reset the text to match

size_t forward(const(T)[] content, size_t ofs = 0);
Search forward in the given content, starting at the optional index.

Returns the index of a match, or content.length where no match was located.

size_t reverse(const(T)[] content, size_t ofs = size_t.max);
Search backward in the given content, starting at the optional index.

Returns the index of a match, or content.length where no match was located.

bool within(const(T)[] content);
Returns true if there is a match within the given content

size_t count(const(T)[] content);
Returns number of matches within the given content

T[] replace(const(T)[] content, T chr);
Replace all matches with the given character. Use method tokens() instead to avoid heap activity.

Returns a copy of the content with replacements made

T[] replace(const(T)[] content, const(T)[] sub = null);
Replace all matches with the given substitution. Use method tokens() instead to avoid heap activity.

Returns a copy of the content with replacements made

Substitute tokens(const(T)[] content, const(T)[] sub = null);
Returns a foreach() iterator which exposes text segments between all matches within the given content. Substitution text is also injected in place of each match, and null can be used to indicate removal instead:
char[] result;

auto match = search ("foo");
foreach (token; match.tokens("$foo&&foo*", "bar"))
         result ~= token;
assert (result == "$bar&&bar*");


This mechanism avoids internal heap activity

Indices indices(const(T)[] content);
Returns a foreach() iterator which exposes the indices of all matches within the given content:
int count;

auto match = search ("foo");
foreach (index; match.indices("$foo&&foo*"))
         ++count;
assert (count is 2);


size_t find(char* what, size_t wlen, char* content, size_t len, size_t ofs);


size_t rfind(char* what, size_t wlen, char* content, size_t len, size_t ofs);


bool matches(char* a, char* b, size_t length);


void reset();
Construct lookup table. We force the alphabet to be char[] always, and consider wider characters to be longer patterns instead

void flip();
Reverse lookup-table direction

struct Indices;
Simple foreach() iterator

struct Substitute;
Substitution foreach() iterator


Page generated by Ddoc. Copyright (c) 2009 Kris Bell. All rights reserved