com.ibm.icu.text
public class RuleBasedBreakIterator extends BreakIterator
A subclass of BreakIterator whose behavior is specified using a list of rules.
UNKNOWN: ICU 2.0
Field Summary | |
---|---|
static int | WORD_IDEO
Tag value for words containing ideographic characters, lower limit |
static int | WORD_IDEO_LIMIT
Tag value for words containing ideographic characters, upper limit |
static int | WORD_KANA
Tag value for words containing kana characters, lower limit |
static int | WORD_KANA_LIMIT
Tag value for words containing kana characters, upper limit |
static int | WORD_LETTER
Tag value for words that contain letters, excluding
hiragana, katakana or ideographic characters, lower limit. |
static int | WORD_LETTER_LIMIT
Tag value for words containing letters, upper limit |
static int | WORD_NONE
Tag value for "words" that do not fit into any of other categories.
|
static int | WORD_NONE_LIMIT
Upper bound for tags for uncategorized words. |
static int | WORD_NUMBER
Tag value for words that appear to be numbers, lower limit. |
static int | WORD_NUMBER_LIMIT
Tag value for words that appear to be numbers, upper limit. |
Constructor Summary | |
---|---|
RuleBasedBreakIterator(String description)
Constructs a RuleBasedBreakIterator_Old according to the description
provided. | |
protected | RuleBasedBreakIterator()
This default constructor is used when creating derived classes
of RulesBasedBreakIterator. |
Method Summary | |
---|---|
Object | clone()
Clones this iterator. |
int | current()
Returns the current iteration position. |
boolean | equals(Object that)
Returns true if both BreakIterators are of the same class, have the same
rules, and iterate over the same text. |
int | first()
Sets the current iteration position to the beginning of the text.
(i.e., the CharacterIterator's starting offset). |
int | following(int offset)
Sets the iterator to refer to the first boundary position following
the specified position. |
static RuleBasedBreakIterator | getInstanceFromCompiledRules(InputStream is)
Get a break iterator based on a set of pre-compiled break rules.
|
int | getRuleStatus()
Return the status tag from the break rule that determined the most recently
returned break position. |
int | getRuleStatusVec(int[] fillInArray)
Get the status (tag) values from the break rule(s) that determined the most
recently returned break position. |
CharacterIterator | getText()
Return a CharacterIterator over the text being analyzed. |
int | hashCode()
Compute a hashcode for this BreakIterator |
boolean | isBoundary(int offset)
Returns true if the specfied position is a boundary position. |
int | last()
Sets the current iteration position to the end of the text.
(i.e., the CharacterIterator's ending offset). |
int | next(int n)
Advances the iterator either forward or backward the specified number of steps.
|
int | next()
Advances the iterator to the next boundary position. |
int | preceding(int offset)
Sets the iterator to refer to the last boundary position before the
specified position. |
int | previous()
Advances the iterator backwards, to the last boundary preceding this one. |
void | setText(CharacterIterator newText)
Set the iterator to analyze a new piece of text. |
String | toString()
Returns the description used to create this iterator |
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
UNKNOWN: ICU 2.0
UNKNOWN: ICU 3.0
Returns: A newly-constructed RuleBasedBreakIterator with the same behavior as this one.
UNKNOWN: ICU 2.0
Returns: The current iteration position.
UNKNOWN: ICU 2.0
UNKNOWN: ICU 2.0
Returns: The offset of the beginning of the text.
UNKNOWN: ICU 2.0
Parameters: offset The position from which to begin searching for a break position.
Returns: The position of the first break after the current position.
UNKNOWN: ICU 2.0
Parameters: is An input stream that supplies the compiled rule data. The format of the rule data on the stream is that of a rule data file produced by the ICU4C tool "genbrk".
Returns: A RuleBasedBreakIterator based on the supplied break rules.
Throws: IOException
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
The values used by the standard ICU break rules are defined as
constants in this class, and allow distinguishing between words
that contain alphabetic letters, "words" that appear to be numbers,
punctuation and spaces, words containing ideographic characters, and
more. Call Returns: the status from the break rule that determined the most recently
returned break position.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release. getRuleStatus
after obtaining a boundary
position from next()
,
previous()
, or
any other break iterator functions that returns a boundary position.
The values used by the standard ICU rules are defined as contants in this class.
If the size of the output array is insufficient to hold the data, the output will be truncated to the available length. No exception will be thrown.
Parameters: fillInArray an array to be filled in with the status values.
Returns: The number of rule status values from rules that determined the most recent boundary returned by the break iterator. In the event that the array is too small, the return value is the total number of status values that were available, not the reduced number that were actually returned.
UNKNOWN: ICU 3.0 This API might change or be removed in a future release.
Returns: An iterator over the text being analyzed.
UNKNOWN: ICU 2.0
Returns: A hash code
UNKNOWN: ICU 2.0
Parameters: offset the offset to check.
Returns: True if "offset" is a boundary position.
UNKNOWN: ICU 2.0
Returns: The text's past-the-end offset.
UNKNOWN: ICU 2.0
Parameters: n The number of steps to move. The sign indicates the direction (negative is backwards, and positive is forwards).
Returns: The character offset of the boundary position n boundaries away from the current one.
UNKNOWN: ICU 2.0
Returns: The position of the first boundary after this one.
UNKNOWN: ICU 2.0
Parameters: offset The position to begin searching for a break from.
Returns: The position of the last boundary before the starting position.
UNKNOWN: ICU 2.0
Returns: The position of the last boundary position preceding this one.
UNKNOWN: ICU 2.0
Parameters: newText An iterator over the text to analyze.
UNKNOWN: ICU 2.0
UNKNOWN: ICU 2.0