java.lang.Object
org.apache.lucene.analysis.hunspell.Dictionary

public class Dictionary extends Object
In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.
  • Field Details

    • MAX_PROLOGUE_SCAN_WINDOW

      static final int MAX_PROLOGUE_SCAN_WINDOW
      See Also:
    • NOFLAGS

      static final char[] NOFLAGS
    • FLAG_UNSET

      static final char FLAG_UNSET
      See Also:
    • DEFAULT_FLAGS

      private static final int DEFAULT_FLAGS
      See Also:
    • HIDDEN_FLAG

      static final char HIDDEN_FLAG
      See Also:
    • DEFAULT_CHARSET

      static final Charset DEFAULT_CHARSET
    • decoder

    • prefixes

      FST<IntsRef> prefixes
    • suffixes

      FST<IntsRef> suffixes
    • breaks

    • patterns

      All condition checks used by prefixes and suffixes. these are typically re-used across many affix stripping rules. so these are deduplicated, to save RAM.
    • words

      The entries in the .dic file, mapping to their set of flags
    • flagLookup

      final FlagEnumerator.Lookup flagLookup
      The list of unique flagsets (wordforms). theoretically huge, but practically small (for Polish this is 756), otherwise humans wouldn't be able to deal with it either.
    • stripData

      char[] stripData
    • stripOffsets

      int[] stripOffsets
    • wordChars

      String wordChars
    • affixData

      char[] affixData
    • currentAffix

      private int currentAffix
    • AFFIX_FLAG

      static final int AFFIX_FLAG
      See Also:
    • AFFIX_STRIP_ORD

      static final int AFFIX_STRIP_ORD
      See Also:
    • AFFIX_CONDITION

      private static final int AFFIX_CONDITION
      See Also:
    • AFFIX_APPEND

      static final int AFFIX_APPEND
      See Also:
    • flagParsingStrategy

      Dictionary.FlagParsingStrategy flagParsingStrategy
    • aliases

      private String[] aliases
    • aliasCount

      private int aliasCount
    • morphAliases

      private String[] morphAliases
    • morphAliasCount

      private int morphAliasCount
    • morphData

      final List<String> morphData
    • hasCustomMorphData

      boolean hasCustomMorphData
      we set this during sorting, so we know to add an extra int (index in morphData) to FST output
    • ignoreCase

      boolean ignoreCase
    • checkSharpS

      boolean checkSharpS
    • complexPrefixes

      boolean complexPrefixes
    • secondStagePrefixFlags

      private char[] secondStagePrefixFlags
      All flags used in affix continuation classes. If an outer affix's flag isn't here, there's no need to do 2-level affix stripping with it.
    • secondStageSuffixFlags

      private char[] secondStageSuffixFlags
      All flags used in affix continuation classes. If an outer affix's flag isn't here, there's no need to do 2-level affix stripping with it.
    • circumfix

      char circumfix
    • keepcase

      char keepcase
    • forceUCase

      char forceUCase
    • needaffix

      char needaffix
    • forbiddenword

      char forbiddenword
    • onlyincompound

      char onlyincompound
    • compoundBegin

      char compoundBegin
    • compoundMiddle

      char compoundMiddle
    • compoundEnd

      char compoundEnd
    • compoundFlag

      char compoundFlag
    • compoundPermit

      char compoundPermit
    • compoundForbid

      char compoundForbid
    • checkCompoundCase

      boolean checkCompoundCase
    • checkCompoundDup

      boolean checkCompoundDup
    • checkCompoundRep

      boolean checkCompoundRep
    • checkCompoundTriple

      boolean checkCompoundTriple
    • simplifiedTriple

      boolean simplifiedTriple
    • compoundMin

      int compoundMin
    • compoundMax

      int compoundMax
    • compoundRules

      List<CompoundRule> compoundRules
    • checkCompoundPatterns

      List<CheckCompoundPattern> checkCompoundPatterns
    • ignore

      private char[] ignore
    • tryChars

      String tryChars
    • neighborKeyGroups

      String[] neighborKeyGroups
    • enableSplitSuggestions

      boolean enableSplitSuggestions
    • repTable

      List<RepEntry> repTable
    • mapTable

      List<List<String>> mapTable
    • maxDiff

      int maxDiff
    • maxNGramSuggestions

      int maxNGramSuggestions
    • onlyMaxDiff

      boolean onlyMaxDiff
    • noSuggest

      char noSuggest
    • subStandard

      char subStandard
    • iconv

      ConvTable iconv
    • oconv

      ConvTable oconv
    • fullStrip

      boolean fullStrip
    • language

      String language
    • alternateCasing

      private boolean alternateCasing
    • BOM_UTF8

      private static final byte[] BOM_UTF8
    • CHARSET_ALIASES

      static final Map<String,String> CHARSET_ALIASES
    • FLAG_SEPARATOR

      private static final char FLAG_SEPARATOR
      See Also:
    • MORPH_SEPARATOR

      private static final char MORPH_SEPARATOR
      See Also:
  • Constructor Details

    • Dictionary

      public Dictionary(Directory tempDir, String tempFileNamePrefix, InputStream affix, InputStream dictionary) throws IOException, ParseException
      Creates a new Dictionary containing the information read from the provided InputStreams to hunspell affix and dictionary files. You have to close the provided InputStreams yourself.
      Parameters:
      tempDir - Directory to use for offline sorting
      tempFileNamePrefix - prefix to use to generate temp file names
      affix - InputStream for reading the hunspell affix file (won't be closed).
      dictionary - InputStream for reading the hunspell dictionary file (won't be closed).
      Throws:
      IOException - Can be thrown while reading from the InputStreams
      ParseException - Can be thrown if the content of the files does not meet expected formats
    • Dictionary

      public Dictionary(Directory tempDir, String tempFileNamePrefix, InputStream affix, List<InputStream> dictionaries, boolean ignoreCase) throws IOException, ParseException
      Creates a new Dictionary containing the information read from the provided InputStreams to hunspell affix and dictionary files. You have to close the provided InputStreams yourself.
      Parameters:
      tempDir - Directory to use for offline sorting
      tempFileNamePrefix - prefix to use to generate temp file names
      affix - InputStream for reading the hunspell affix file (won't be closed).
      dictionaries - InputStream for reading the hunspell dictionary files (won't be closed).
      Throws:
      IOException - Can be thrown while reading from the InputStreams
      ParseException - Can be thrown if the content of the files does not meet expected formats
  • Method Details