com.ibm.icu.text
public abstract class BreakIterator extends Object implements Cloneable
Examples:
Creating and using text boundaries
Print each element in orderpublic static void main(String args[]) { if (args.length == 1) { String stringToExamine = args[0]; //print each word in order BreakIterator boundary = BreakIterator.getWordInstance(); boundary.setText(stringToExamine); printEachForward(boundary, stringToExamine); //print each sentence in reverse order boundary = BreakIterator.getSentenceInstance(Locale.US); boundary.setText(stringToExamine); printEachBackward(boundary, stringToExamine); printFirst(boundary, stringToExamine); printLast(boundary, stringToExamine); } }
Print each element in reverse orderpublic static void printEachForward(BreakIterator boundary, String source) { int start = boundary.first(); for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) { System.out.println(source.substring(start,end)); } }
Print first elementpublic static void printEachBackward(BreakIterator boundary, String source) { int end = boundary.last(); for (int start = boundary.previous(); start != BreakIterator.DONE; end = start, start = boundary.previous()) { System.out.println(source.substring(start,end)); } }
Print last elementpublic static void printFirst(BreakIterator boundary, String source) { int start = boundary.first(); int end = boundary.next(); System.out.println(source.substring(start,end)); }
Print the element at a specified positionpublic static void printLast(BreakIterator boundary, String source) { int end = boundary.last(); int start = boundary.previous(); System.out.println(source.substring(start,end)); }
Find the next wordpublic static void printAt(BreakIterator boundary, int pos, String source) { int end = boundary.following(pos); int start = boundary.previous(); System.out.println(source.substring(start,end)); }
public static int nextWordStartAfter(int pos, String text) { BreakIterator wb = BreakIterator.getWordInstance(); wb.setText(text); int last = wb.following(pos); int current = wb.next(); while (current != BreakIterator.DONE) { for (int p = last; p < current; p++) { if (Character.isLetter(text.charAt(p)) return last; } last = current; current = wb.next(); } return BreakIterator.DONE; }(The iterator returned by BreakIterator.getWordInstance() is unique in that the break positions it returns don't represent both the start and end of the thing being iterated over. That is, a sentence-break iterator returns breaks that each represent the end of one sentence and the beginning of the next. With the word-break iterator, the characters between two boundaries might be a word, or they might be the punctuation or whitespace between two words. The above code uses a simple heuristic to determine which boundary is the beginning of a word: If the characters between this boundary and the next boundary include at least one letter (this can be an alphabetical letter, a CJK ideograph, a Hangul syllable, a Kana character, etc.), then the text between this boundary and the next is a word; otherwise, it's the material between words.)
See Also: CharacterIterator
UNKNOWN: ICU 2.0
Field Summary | |
---|---|
static int | DONE
DONE is returned by previous() and next() after all valid
boundaries have been returned. |
static int | KIND_CHARACTER |
static int | KIND_LINE |
static int | KIND_SENTENCE |
static int | KIND_TITLE |
static int | KIND_WORD |
Constructor Summary | |
---|---|
protected | BreakIterator()
Default constructor. |
Method Summary | |
---|---|
Object | clone()
Clone method. |
abstract int | current()
Return the iterator's current position. |
abstract int | first()
Return the first boundary position. |
abstract int | following(int offset)
Sets the iterator's current iteration position to be the first
boundary position following the specified position. |
static Locale[] | getAvailableLocales()
Returns a list of locales for which BreakIterators can be used. |
static ULocale[] | getAvailableULocales()
Returns a list of locales for which BreakIterators can be used. |
static BreakIterator | getCharacterInstance()
Returns a new instance of BreakIterator that locates logical-character
boundaries. |
static BreakIterator | getCharacterInstance(Locale where)
Returns a new instance of BreakIterator that locates logical-character
boundaries. |
static BreakIterator | getCharacterInstance(ULocale where)
Returns a new instance of BreakIterator that locates logical-character
boundaries. |
static BreakIterator | getLineInstance()
Returns a new instance of BreakIterator that locates legal line-
wrapping positions. |
static BreakIterator | getLineInstance(Locale where)
Returns a new instance of BreakIterator that locates legal line-
wrapping positions. |
static BreakIterator | getLineInstance(ULocale where)
Returns a new instance of BreakIterator that locates legal line-
wrapping positions. |
ULocale | getLocale(ULocale.Type type)
Return the locale that was used to create this object, or null.
|
static BreakIterator | getSentenceInstance()
Returns a new instance of BreakIterator that locates sentence boundaries.
|
static BreakIterator | getSentenceInstance(Locale where)
Returns a new instance of BreakIterator that locates sentence boundaries. |
static BreakIterator | getSentenceInstance(ULocale where)
Returns a new instance of BreakIterator that locates sentence boundaries. |
abstract CharacterIterator | getText()
Returns a CharacterIterator over the text being analyzed.
|
static BreakIterator | getTitleInstance()
Returns a new instance of BreakIterator that locates title boundaries.
|
static BreakIterator | getTitleInstance(Locale where)
Returns a new instance of BreakIterator that locates title boundaries.
|
static BreakIterator | getTitleInstance(ULocale where)
Returns a new instance of BreakIterator that locates title boundaries.
|
static BreakIterator | getWordInstance()
Returns a new instance of BreakIterator that locates word boundaries.
|
static BreakIterator | getWordInstance(Locale where)
Returns a new instance of BreakIterator that locates word boundaries. |
static BreakIterator | getWordInstance(ULocale where)
Returns a new instance of BreakIterator that locates word boundaries. |
boolean | isBoundary(int offset)
Return true if the specfied position is a boundary position. |
abstract int | last()
Return the last boundary position. |
abstract int | next(int n)
Advances the specified number of steps forward in the text (a negative
number, therefore, advances backwards). |
abstract int | next()
Advances the iterator forward one boundary. |
int | preceding(int offset)
Sets the iterator's current iteration position to be the last
boundary position preceding the specified position. |
abstract int | previous()
Advances the iterator backward one boundary. |
static Object | registerInstance(BreakIterator iter, Locale locale, int kind)
Register a new break iterator of the indicated kind, to use in the given locale.
|
static Object | registerInstance(BreakIterator iter, ULocale locale, int kind)
Register a new break iterator of the indicated kind, to use in the given locale.
|
void | setText(String newText)
Sets the iterator to analyze a new piece of text. |
abstract void | setText(CharacterIterator newText)
Sets the iterator to analyze a new piece of text. |
static boolean | unregister(Object key)
Unregister a previously-registered BreakIterator using the key returned from the
register call. |
UNKNOWN: ICU 2.0
UNKNOWN: ICU 2.4
UNKNOWN: ICU 2.4
UNKNOWN: ICU 2.4
UNKNOWN: ICU 2.4
UNKNOWN: ICU 2.4
UNKNOWN: ICU 2.0
Returns: The clone.
UNKNOWN: ICU 2.0
Returns: The iterator's current position.
UNKNOWN: ICU 2.0
Returns: The character offset of the beginning of the stretch of text being broken.
UNKNOWN: ICU 2.0
Parameters: offset The character position to start searching from.
Returns: The position of the first boundary position following "offset" (whether or not "offset" itself is a boundary position), or DONE if "offset" is the past-the-end offset.
UNKNOWN: ICU 2.0
Returns: An array of Locales. All of the locales in the array can be used when creating a BreakIterator.
UNKNOWN: ICU 2.6
Returns: An array of Locales. All of the locales in the array can be used when creating a BreakIterator.
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Returns: A new instance of BreakIterator that locates logical-character boundaries.
UNKNOWN: ICU 2.0
Parameters: where A Locale specifying the language of the text being analyzed.
Returns: A new instance of BreakIterator that locates logical-character boundaries.
UNKNOWN: ICU 2.0
Parameters: where A Locale specifying the language of the text being analyzed.
Returns: A new instance of BreakIterator that locates logical-character boundaries.
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Returns: A new instance of BreakIterator that locates legal line-wrapping positions.
UNKNOWN: ICU 2.0
Parameters: where A Locale specifying the language of the text being broken.
Returns: A new instance of BreakIterator that locates legal line-wrapping positions.
UNKNOWN: ICU 2.0
Parameters: where A Locale specifying the language of the text being broken.
Returns: A new instance of BreakIterator that locates legal line-wrapping positions.
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Note: This method will be implemented in ICU 3.0; ICU 2.8 contains a partial preview implementation. The * actual locale is returned correctly, but the valid locale is not, in most cases.
Parameters: type type of information requested, either {@link com.ibm.icu.util.ULocale#VALID_LOCALE} or {@link com.ibm.icu.util.ULocale#ACTUAL_LOCALE}.
Returns: the information specified by type, or null if this object was not constructed from locale data.
See Also: ULocale VALID_LOCALE ACTUAL_LOCALE
UNKNOWN: ICU 2.8 (retain) This API might change or be removed in a future release.
Returns: A new instance of BreakIterator that locates sentence boundaries.
UNKNOWN: ICU 2.0
Parameters: where A Locale specifying the language of the text being analyzed.
Returns: A new instance of BreakIterator that locates sentence boundaries.
UNKNOWN: ICU 2.0
Parameters: where A Locale specifying the language of the text being analyzed.
Returns: A new instance of BreakIterator that locates sentence boundaries.
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Returns: A CharacterIterator over the text being analyzed.
UNKNOWN: ICU 2.0
Returns: A new instance of BreakIterator that locates title boundaries.
UNKNOWN: ICU 2.0
Parameters: where A Locale specifying the language of the text being analyzed.
Returns: A new instance of BreakIterator that locates title boundaries.
UNKNOWN: ICU 2.0
Parameters: where A Locale specifying the language of the text being analyzed.
Returns: A new instance of BreakIterator that locates title boundaries.
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Returns: An instance of BreakIterator that locates word boundaries.
UNKNOWN: ICU 2.0
Parameters: where A locale specifying the language of the text to be analyzed.
Returns: An instance of BreakIterator that locates word boundaries.
UNKNOWN: ICU 2.0
Parameters: where A locale specifying the language of the text to be analyzed.
Returns: An instance of BreakIterator that locates word boundaries.
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Parameters: offset the offset to check.
Returns: True if "offset" is a boundary position.
UNKNOWN: ICU 2.0
Returns: The character offset of the end of the stretch of text being broken.
UNKNOWN: ICU 2.0
Parameters: n The number of boundaries to advance over (if positive, moves forward; if negative, moves backwards).
Returns: The position of the boundary n boundaries from the current iteration position, or DONE if moving n boundaries causes the iterator to advance off either end of the text.
UNKNOWN: ICU 2.0
Returns: The position of the first boundary position following the iteration position.
UNKNOWN: ICU 2.0
Parameters: offset The character position to start searching from.
Returns: The position of the last boundary position preceding "offset" (whether of not "offset" itself is a boundary position), or DONE if "offset" is the starting offset of the iterator.
UNKNOWN: ICU 2.0
Returns: The position of the last boundary position preceding the iteration position.
UNKNOWN: ICU 2.0
Parameters: iter the BreakIterator instance to adopt. locale the Locale for which this instance is to be registered kind the type of iterator for which this instance is to be registered
Returns: a registry key that can be used to unregister this instance
UNKNOWN: ICU 2.4
Parameters: iter the BreakIterator instance to adopt. locale the Locale for which this instance is to be registered kind the type of iterator for which this instance is to be registered
Returns: a registry key that can be used to unregister this instance
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Parameters: newText A String containing the text to analyze with this BreakIterator.
UNKNOWN: ICU 2.0
Parameters: newText A CharacterIterator referring to the text to analyze with this BreakIterator (the iterator's current position is ignored, but its other state is significant).
UNKNOWN: ICU 2.0
Parameters: key the registry key returned by a previous call to registerInstance
Returns: true if the iterator for the key was successfully unregistered
UNKNOWN: ICU 2.4