java.lang.Object
org.apache.lucene.analysis.hunspell.Hunspell

public class Hunspell extends Object
A spell checker based on Hunspell dictionaries. This class can be used in place of native Hunspell for many languages for spell-checking and suggesting purposes. Note that not all languages are supported yet. For example:
  • Hungarian (as it doesn't only rely on dictionaries, but has some logic directly in the source code
  • Languages with Unicode characters outside of the Basic Multilingual Plane
  • PHONE affix file option for suggestions

The objects of this class are thread-safe.

  • Field Details

  • Constructor Details

    • Hunspell

      public Hunspell(Dictionary dictionary)
    • Hunspell

      public Hunspell(Dictionary dictionary, TimeoutPolicy policy, Runnable checkCanceled)
      Parameters:
      policy - a strategy determining what to do when API calls take too much time
      checkCanceled - an object that's periodically called, allowing to interrupt spell-checking or suggestion generation by throwing an exception
  • Method Details

    • spell

      public boolean spell(String word)
      Returns:
      whether the given word's spelling is considered correct according to Hunspell rules
    • spellClean

      private boolean spellClean(String word)
    • spellWithTrailingDots

      private boolean spellWithTrailingDots(String word)
    • checkWord

      boolean checkWord(String word)
    • checkSimpleWord

      Boolean checkSimpleWord(char[] wordChars, int length, WordCase originalCase)
    • checkWord

      private boolean checkWord(char[] wordChars, int length, WordCase originalCase)
    • checkCompounds

      private boolean checkCompounds(char[] wordChars, int length, WordCase originalCase)
    • findStem

      Root<CharsRef> findStem(char[] wordChars, int offset, int length, WordCase originalCase, WordContext context)
    • acceptCase

      private boolean acceptCase(WordCase originalCase, int entryId, CharsRef root)
    • containsSharpS

      private boolean containsSharpS(char[] word, int offset, int length)
    • acceptsStem

      boolean acceptsStem(int formID)
    • checkCompounds

      private boolean checkCompounds(CharsRef word, WordCase originalCase, Hunspell.CompoundPart prev)
    • checkCompoundPatternReplacements

      private boolean checkCompoundPatternReplacements(CharsRef word, int pos, WordCase originalCase, Hunspell.CompoundPart prev)
    • checkCompoundsAfter

      private boolean checkCompoundsAfter(WordCase originalCase, Hunspell.CompoundPart prev)
    • hasForceUCaseProblem

      private boolean hasForceUCaseProblem(Root<?> root, WordCase originalCase, char[] wordChars)
    • getRoots

      public List<String> getRoots(String word)
      Find all roots that could result in the given word after case conversion and adding affixes. This corresponds to the original hunspell -s (stemming) functionality.

      Some affix rules are relaxed in this stemming process: e.g. explicitly forbidden words are still returned. Some of the returned roots may be synthetic and not directly occur in the *.dic file (but differ from some existing entries in case). No roots are returned for compound words.

      The returned roots may be used to retrieve morphological data via Dictionary.lookupEntries(java.lang.String).

    • analyzeSimpleWord

      public List<AffixedWord> analyzeSimpleWord(String word)
      Returns:
      all possible analyses of the given word with stems, prefixes, suffixed and morphological data. Note that the order of the returned objects might not correspond to the *.dic file order!
    • getAllWordForms

      public List<AffixedWord> getAllWordForms(String root)
      Generate all word forms for all dictionary entries with the given root word. The result order is stable but not specified. This is equivalent to "unmunch" from the "hunspell-tools" package.
      See Also:
    • compress

      public EntrySuggestion compress(List<String> words)
      Given a list of words, try to produce a smaller set of dictionary entries (with some flags) that would generate these words. This is equivalent to "munch" from the "hunspell-tools" package.
      See Also:
    • mayBreakIntoCompounds

      private boolean mayBreakIntoCompounds(char[] chars, int offset, int length, int breakPos)
    • checkCompoundRules

      private boolean checkCompoundRules(char[] wordChars, int offset, int length, List<IntsRef> words)
    • checkLastCompoundPart

      private boolean checkLastCompoundPart(char[] wordChars, int start, int length, List<IntsRef> words)
    • isNumber

      private static boolean isNumber(String s)
    • isDigit

      private static boolean isDigit(char c)
    • tryBreaks

      private boolean tryBreaks(String word)
    • hasTooManyBreakOccurrences

      private boolean hasTooManyBreakOccurrences(String word)
    • canBeBrokenAt

      private boolean canBeBrokenAt(String word, String breakStr, int breakPos)
    • suggest

      public List<String> suggest(String word) throws SuggestionTimeoutException
      Returns:
      suggestions for the given misspelled word
      Throws:
      SuggestionTimeoutException - if the computation takes too long and TimeoutPolicy.THROW_EXCEPTION was specified in the constructor
      See Also:
    • suggest

      public List<String> suggest(String word, long timeLimitMs) throws SuggestionTimeoutException
      Parameters:
      word - the misspelled word to calculate suggestions for
      timeLimitMs - the duration limit in milliseconds, after which the associated TimeoutPolicy's effects (exception or partial result) may kick in
      Throws:
      SuggestionTimeoutException - if the computation takes too long and TimeoutPolicy.THROW_EXCEPTION was specified in the constructor
      See Also: