Class UserDictionary

  • All Implemented Interfaces:
    Dictionary

    public final class UserDictionary
    extends java.lang.Object
    implements Dictionary
    Class for building a User Dictionary. This class allows for custom segmentation of phrases.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private UserDictionary​(java.util.List<java.lang.String[]> featureEntries)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private java.lang.String[] getAllFeaturesArray​(int wordId)  
      java.lang.String getBaseForm​(int wordId, char[] surface, int off, int len)
      Get base form of word
      private java.lang.String getFeature​(int wordId, int... fields)  
      TokenInfoFST getFST()  
      java.lang.String getInflectionForm​(int wordId)
      Get inflection form of tokens
      java.lang.String getInflectionType​(int wordId)
      Get inflection type of tokens
      int getLeftId​(int wordId)
      Get left id of specified word
      java.lang.String getPartOfSpeech​(int wordId)
      Get Part-Of-Speech of tokens
      java.lang.String getPronunciation​(int wordId, char[] surface, int off, int len)
      Get pronunciation of tokens
      java.lang.String getReading​(int wordId, char[] surface, int off, int len)
      Get reading of tokens
      int getRightId​(int wordId)
      Get right id of specified word
      int getWordCost​(int wordId)
      Get word cost of specified word
      int[][] lookup​(char[] chars, int off, int len)
      Lookup words in text
      int[] lookupSegmentation​(int phraseID)  
      static UserDictionary open​(java.io.Reader reader)  
      private int[][] toIndexArray​(java.util.Map<java.lang.Integer,​int[]> input)
      Convert Map of index and wordIdAndLength to array of {wordId, index, length}
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • segmentations

        private final int[][] segmentations
      • data

        private final java.lang.String[] data
      • CUSTOM_DICTIONARY_WORD_ID_OFFSET

        private static final int CUSTOM_DICTIONARY_WORD_ID_OFFSET
        See Also:
        Constant Field Values
      • EMPTY_RESULT

        private static final int[][] EMPTY_RESULT
    • Constructor Detail

      • UserDictionary

        private UserDictionary​(java.util.List<java.lang.String[]> featureEntries)
                        throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • open

        public static UserDictionary open​(java.io.Reader reader)
                                   throws java.io.IOException
        Throws:
        java.io.IOException
      • lookup

        public int[][] lookup​(char[] chars,
                              int off,
                              int len)
                       throws java.io.IOException
        Lookup words in text
        Parameters:
        chars - text
        off - offset into text
        len - length of text
        Returns:
        array of {wordId, position, length}
        Throws:
        java.io.IOException
      • toIndexArray

        private int[][] toIndexArray​(java.util.Map<java.lang.Integer,​int[]> input)
        Convert Map of index and wordIdAndLength to array of {wordId, index, length}
        Returns:
        array of {wordId, index, length}
      • lookupSegmentation

        public int[] lookupSegmentation​(int phraseID)
      • getLeftId

        public int getLeftId​(int wordId)
        Description copied from interface: Dictionary
        Get left id of specified word
        Specified by:
        getLeftId in interface Dictionary
        Returns:
        left id
      • getRightId

        public int getRightId​(int wordId)
        Description copied from interface: Dictionary
        Get right id of specified word
        Specified by:
        getRightId in interface Dictionary
        Returns:
        right id
      • getWordCost

        public int getWordCost​(int wordId)
        Description copied from interface: Dictionary
        Get word cost of specified word
        Specified by:
        getWordCost in interface Dictionary
        Returns:
        word's cost
      • getReading

        public java.lang.String getReading​(int wordId,
                                           char[] surface,
                                           int off,
                                           int len)
        Description copied from interface: Dictionary
        Get reading of tokens
        Specified by:
        getReading in interface Dictionary
        Parameters:
        wordId - word ID of token
        Returns:
        Reading of the token
      • getPartOfSpeech

        public java.lang.String getPartOfSpeech​(int wordId)
        Description copied from interface: Dictionary
        Get Part-Of-Speech of tokens
        Specified by:
        getPartOfSpeech in interface Dictionary
        Parameters:
        wordId - word ID of token
        Returns:
        Part-Of-Speech of the token
      • getBaseForm

        public java.lang.String getBaseForm​(int wordId,
                                            char[] surface,
                                            int off,
                                            int len)
        Description copied from interface: Dictionary
        Get base form of word
        Specified by:
        getBaseForm in interface Dictionary
        Parameters:
        wordId - word ID of token
        Returns:
        Base form (only different for inflected words, otherwise null)
      • getPronunciation

        public java.lang.String getPronunciation​(int wordId,
                                                 char[] surface,
                                                 int off,
                                                 int len)
        Description copied from interface: Dictionary
        Get pronunciation of tokens
        Specified by:
        getPronunciation in interface Dictionary
        Parameters:
        wordId - word ID of token
        Returns:
        Pronunciation of the token
      • getInflectionType

        public java.lang.String getInflectionType​(int wordId)
        Description copied from interface: Dictionary
        Get inflection type of tokens
        Specified by:
        getInflectionType in interface Dictionary
        Parameters:
        wordId - word ID of token
        Returns:
        inflection type, or null
      • getInflectionForm

        public java.lang.String getInflectionForm​(int wordId)
        Description copied from interface: Dictionary
        Get inflection form of tokens
        Specified by:
        getInflectionForm in interface Dictionary
        Parameters:
        wordId - word ID of token
        Returns:
        inflection form, or null
      • getAllFeaturesArray

        private java.lang.String[] getAllFeaturesArray​(int wordId)
      • getFeature

        private java.lang.String getFeature​(int wordId,
                                            int... fields)