#include <bmsearch.h>
Public Member Functions | |
BoyerMooreSearch (CollData *theData, const UnicodeString &patternString, const UnicodeString *targetString, UErrorCode &status) | |
Construct a BoyerMooreSearch object. | |
~BoyerMooreSearch () | |
The desstructor. | |
UBool | empty () |
Test the pattern to see if it generates any CEs. | |
UBool | search (int32_t offset, int32_t &start, int32_t &end) |
Search for the pattern string in the target string. | |
void | setTargetString (const UnicodeString *targetString, UErrorCode &status) |
Set the target string for the match. | |
CollData * | getData () |
Return the CollData object used for searching. | |
CEList * | getPatternCEs () |
Return the CEs generated by the pattern string. | |
BadCharacterTable * | getBadCharacterTable () |
Return the BadCharacterTable object computed for the pattern string. | |
GoodSuffixTable * | getGoodSuffixTable () |
Return the GoodSuffixTable object computed for the pattern string. | |
virtual UClassID | getDynamicClassID () const |
ICU4C "poor man's RTTI", returns a UClassID for the actual ICU class. | |
Static Public Member Functions | |
static UClassID | getStaticClassID () |
This object holds the information needed to do a Collation sensitive Boyer-Moore search. It encapulates the pattern, the "bad character" and "good suffix" tables, the Collator-based data needed to compute them, and a reference to the text being searched.
To do a search, you fist need to get a CollData
object by calling CollData::open
. Then you construct a BoyerMooreSearch
object from the CollData
object, the pattern string and the target string. Then you call the search
method. Here's a code sample:
void boyerMooreExample(UCollator *collator, UnicodeString *pattern, UnicodeString *target) { UErrorCode status = U_ZERO_ERROR; CollData *collData = CollData::open(collator, status);
if (U_FAILURE(status)) { // could not create a CollData object return; }
BoyerMooreSearch *search = new BoyerMooreSearch(collData, *patternString, target, status);
if (U_FAILURE(status)) { // could not create a BoyerMooreSearch object CollData::close(collData); return; }
int32_t offset = 0, start = -1, end = -1;
// Find all matches while (search->search(offset, start, end)) { // process the match between start and end ... // advance past the match offset = end; }
// at this point, if offset == 0, there were no matches if (offset == 0) { // handle the case of no matches }
delete search; CollData::close(collData);
// CollData objects are cached, so the call to // CollData::close doesn't delete the object. // Call this if you don't need the object any more. CollData::flushCollDataCache(); }
NOTE: This is a technology preview. The final version of this API may not bear any resenblence to this API.
Knows linitations: 1) Backwards searching has not been implemented.
2) For Han and Hangul characters, this code ignores any Collation tailorings. In general, this isn't a problem, but in Korean locals, at strength 1, Hangul characters are tailored to be equal to Han characters with the same pronounciation. Because this code ignroes tailorings, searching for a Hangul character will not find a Han character and visa-versa.
3) In some cases, searching for a pattern that needs to be normalized and ends in a discontiguous contraction may fail. The only known cases of this are with the Tibetan script. For example searching for the pattern "\u0F7F\u0F80\u0F81\u0F82\u0F83\u0F84\u0F85" will fail. (This case is artificial. We've been unable to find a pratical, real-world example of this failure.)
Definition at line 107 of file bmsearch.h.
BoyerMooreSearch::BoyerMooreSearch | ( | CollData * | theData, | |
const UnicodeString & | patternString, | |||
const UnicodeString * | targetString, | |||
UErrorCode & | status | |||
) |
Construct a BoyerMooreSearch
object.
theData | - A CollData object holding the Collator-sensitive data | |
patternString | - the string for which to search | |
targetString | - the string in which to search or NULL if youu will set it later by calling setTargetString . | |
status | - will be set if any errors occur. |
BoyerMooreSearch::~BoyerMooreSearch | ( | ) |
The desstructor.
UBool BoyerMooreSearch::empty | ( | ) |
Test the pattern to see if it generates any CEs.
TRUE
if the pattern string did not generate any CEsBadCharacterTable* BoyerMooreSearch::getBadCharacterTable | ( | ) |
Return the BadCharacterTable
object computed for the pattern string.
BadCharacterTable
object.CollData* BoyerMooreSearch::getData | ( | ) |
virtual UClassID BoyerMooreSearch::getDynamicClassID | ( | ) | const [virtual] |
GoodSuffixTable* BoyerMooreSearch::getGoodSuffixTable | ( | ) |
Return the GoodSuffixTable
object computed for the pattern string.
GoodSuffixTable
object computed for the pattern string.CEList* BoyerMooreSearch::getPatternCEs | ( | ) |
UBool BoyerMooreSearch::search | ( | int32_t | offset, | |
int32_t & | start, | |||
int32_t & | end | |||
) |
Search for the pattern string in the target string.
offset | - the offset in the target string at which to begin the search | |
start | - will be set to the starting offset of the match, or -1 if there's no match | |
end | - will be set to the ending offset of the match, or -1 if there's no match |
TRUE
if the match succeeds, FALSE
otherwise.void BoyerMooreSearch::setTargetString | ( | const UnicodeString * | targetString, | |
UErrorCode & | status | |||
) |
Set the target string for the match.
targetString | - the new target string | |
status | - will be set if any errors occur. |