Class ICUTokenizerConfig

  • Direct Known Subclasses:
    DefaultICUTokenizerConfig

    public abstract class ICUTokenizerConfig
    extends java.lang.Object
    Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int EMOJI_SEQUENCE_STATUS
      Rule status for emoji sequences
    • Constructor Summary

      Constructors 
      Constructor Description
      ICUTokenizerConfig()
      Sole constructor.
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      abstract boolean combineCJ()
      true if Han, Hiragana, and Katakana scripts should all be returned as Japanese
      abstract com.ibm.icu.text.RuleBasedBreakIterator getBreakIterator​(int script)
      Return a breakiterator capable of processing a given script.
      abstract java.lang.String getType​(int script, int ruleStatus)
      Return a token type value for a given script and BreakIterator rule status.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • EMOJI_SEQUENCE_STATUS

        public static final int EMOJI_SEQUENCE_STATUS
        Rule status for emoji sequences
        See Also:
        Constant Field Values
    • Constructor Detail

      • ICUTokenizerConfig

        public ICUTokenizerConfig()
        Sole constructor. (For invocation by subclass constructors, typically implicit.)
    • Method Detail

      • getBreakIterator

        public abstract com.ibm.icu.text.RuleBasedBreakIterator getBreakIterator​(int script)
        Return a breakiterator capable of processing a given script.
      • getType

        public abstract java.lang.String getType​(int script,
                                                 int ruleStatus)
        Return a token type value for a given script and BreakIterator rule status.
      • combineCJ

        public abstract boolean combineCJ()
        true if Han, Hiragana, and Katakana scripts should all be returned as Japanese