com.ibm.icu.text
Class RuleBasedBreakIterator_Old.Builder
protected
class
RuleBasedBreakIterator_Old.Builder
extends Object
The Builder class has the job of constructing a RuleBasedBreakIterator_Old from a
textual description. A Builder is constructed by RuleBasedBreakIterator_Old's
constructor, which uses it to construct the iterator itself and then throws it
away.
The construction logic is separated out into its own class for two primary
reasons:
- The construction logic is quite sophisticated and large. Separating it
out into its own class means the code must only be loaded into memory while a
RuleBasedBreakIterator_Old is being constructed, and can be purged after that.
- There is a fair amount of state that must be maintained throughout the
construction process that is not needed by the iterator after construction.
Separating this state out into another class prevents all of the functions that
construct the iterator from having to have really long parameter lists,
(hopefully) contributing to readability and maintainability.
It'd be really nice if this could be an independent class rather than an
inner class, because that would shorten the source file considerably, but
making Builder an inner class of RuleBasedBreakIterator_Old allows it direct access
to RuleBasedBreakIterator_Old's private members, which saves us from having to
provide some kind of "back door" to the Builder class that could then also be
used by other classes.
UNKNOWN:
Field Summary |
protected static int | ALL_FLAGS
A bit mask representing the union of the mask values listed above.
|
protected Vector | categories
A temporary holding place used for calculating the character categories.
|
protected boolean | clearLoopingStates
A flag that is used to indicate when the list of looping states can
be reset. |
protected Vector | decisionPointList
A list of all the states that have to be filled in with transitions to the
next state that is created. |
protected Stack | decisionPointStack
A stack for holding decision point lists. |
protected static int | DONT_LOOP_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a
state as one the builder shouldn't loop to any looping states |
protected Hashtable | expressions
A table used to map parts of regexp text to lists of character categories,
rather than having to figure them out from scratch each time |
protected static int | END_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a
state as an accepting state. |
protected UnicodeSet | ignoreChars
A temporary holding place for the list of ignore characters |
protected Vector | loopingStates
A list of states that loop back on themselves. |
protected static int | LOOKAHEAD_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a
state as a lookahead state. |
protected Vector | mergeList
A list mapping pairs of state numbers for states that are to be combined
to the state number of the state representing their combination. |
protected Vector | statesToBackfill
Looping states actually have to be backfilled later in the process
than everything else. |
protected Vector | tempStateTable
A temporary holding place where the forward state table is built |
Method Summary |
void | buildBreakIterator()
This is the main function for setting up the BreakIterator's tables. |
protected void | buildCharCategories(Vector tempRuleList)
This function builds the character category table. |
protected void | debugPrintTempStateTable() |
protected void | debugPrintVector(String label, Vector v) |
protected void | debugPrintVectorOfVectors(String label1, String label2, Vector v) |
protected void | error(String message, int position, String context)
Throws an IllegalArgumentException representing a syntax error in the rule
description. |
protected void | handleSpecialSubstitution(String replace, String replaceWith, int startPos, String description)
This function defines a protocol for handling substitution names that
are "special," i.e., that have some property beyond just being
substitutions. |
protected void | mungeExpressionList(Hashtable expressions) |
protected String | processSubstitution(String substitutionRule, String description, int startPos)
This function performs variable-name substitutions. |
protected static final int ALL_FLAGS
A bit mask representing the union of the mask values listed above.
Used for clearing or masking off the flag bits.
UNKNOWN:
protected Vector categories
A temporary holding place used for calculating the character categories.
This object contains UnicodeSet objects.
UNKNOWN:
protected boolean clearLoopingStates
A flag that is used to indicate when the list of looping states can
be reset.
UNKNOWN:
protected Vector decisionPointList
A list of all the states that have to be filled in with transitions to the
next state that is created. Used when building the state table from the
regular expressions.
UNKNOWN:
protected Stack decisionPointStack
A stack for holding decision point lists. This is used to handle nested
parentheses and braces in regexps.
UNKNOWN:
protected static final int DONT_LOOP_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a
state as one the builder shouldn't loop to any looping states
UNKNOWN:
protected Hashtable expressions
A table used to map parts of regexp text to lists of character categories,
rather than having to figure them out from scratch each time
UNKNOWN:
protected static final int END_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a
state as an accepting state.
UNKNOWN:
A temporary holding place for the list of ignore characters
UNKNOWN:
protected Vector loopingStates
A list of states that loop back on themselves. Used to handle .*?
UNKNOWN:
protected static final int LOOKAHEAD_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a
state as a lookahead state.
UNKNOWN:
protected Vector mergeList
A list mapping pairs of state numbers for states that are to be combined
to the state number of the state representing their combination. Used
in the process of making the state table deterministic to prevent
infinite recursion.
UNKNOWN:
protected Vector statesToBackfill
Looping states actually have to be backfilled later in the process
than everything else. This is where a the list of states to backfill
is accumulated. This is also used to handle .*?
UNKNOWN:
protected Vector tempStateTable
A temporary holding place where the forward state table is built
UNKNOWN:
public Builder()
No special construction is required for the Builder.
UNKNOWN:
public void buildBreakIterator()
This is the main function for setting up the BreakIterator's tables. It
just vectors different parts of the job off to other functions.
UNKNOWN:
protected void buildCharCategories(Vector tempRuleList)
This function builds the character category table. On entry,
tempRuleList is a vector of break rules that has had variable names substituted.
On exit, the charCategoryTable data member has been initialized to hold the
character category table, and tempRuleList's rules have been munged to contain
character category numbers everywhere a literal character or a [] expression
originally occurred.
UNKNOWN:
protected void debugPrintTempStateTable()
protected void debugPrintVector(String label, Vector v)
protected void debugPrintVectorOfVectors(String label1, String label2, Vector v)
protected void error(String message, int position, String context)
Throws an IllegalArgumentException representing a syntax error in the rule
description. The exception's message contains some debugging information.
Parameters: message A message describing the problem position The position in the description where the problem was
discovered context The string containing the error
UNKNOWN:
protected void handleSpecialSubstitution(String replace, String replaceWith, int startPos, String description)
This function defines a protocol for handling substitution names that
are "special," i.e., that have some property beyond just being
substitutions. At the RuleBasedBreakIterator_Old level, we have one
special substitution name, IGNORE_VAR. Subclasses can override this
function to add more. Any special processing that has to go on beyond
that which is done by the normal substitution-processing code is done
here.
UNKNOWN:
protected void mungeExpressionList(Hashtable expressions)
protected String processSubstitution(String substitutionRule, String description, int startPos)
This function performs variable-name substitutions. First it does syntax
checking on the variable-name definition. If it's syntactically valid, it
then goes through the remainder of the description and does a simple
find-and-replace of the variable name with its text. (The variable text
must be enclosed in either [] or () for this to work.)
UNKNOWN: