java.lang.Object
org.apache.lucene.util.automaton.LevenshteinAutomata
Class to construct DFAs that match a word within some edit distance.
Implements the algorithm described in: Schulz and Mihov: Fast String Correction with Levenshtein Automata
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static class
A ParametricDescription describes the structure of a Levenshtein DFA for some degree n. -
Field Summary
FieldsModifier and TypeFieldDescription(package private) final int[]
(package private) final int
(package private) LevenshteinAutomata.ParametricDescription[]
static final int
Maximum edit distance this class can generate an automaton for.(package private) int
(package private) final int[]
(package private) final int[]
(package private) final int[]
-
Constructor Summary
ConstructorsConstructorDescriptionLevenshteinAutomata
(int[] word, int alphaMax, boolean withTranspositions) Expert: specify a custom maximum possible symbol (alphaMax); default is Character.MAX_CODE_POINT.LevenshteinAutomata
(String input, boolean withTranspositions) Create a new LevenshteinAutomata for some input String. -
Method Summary
Modifier and TypeMethodDescriptionprivate static int[]
codePoints
(String input) (package private) int
getVector
(int x, int pos, int end) Get the characteristic vectorX(x, V)
where V issubstring(pos, end)
toAutomaton
(int n) Compute a DFA that accepts all strings within an edit distance ofn
.toAutomaton
(int n, String prefix) Compute a DFA that accepts all strings within an edit distance ofn
, matching the specified exact prefix.
-
Field Details
-
MAXIMUM_SUPPORTED_DISTANCE
public static final int MAXIMUM_SUPPORTED_DISTANCEMaximum edit distance this class can generate an automaton for.- See Also:
-
word
final int[] word -
alphabet
final int[] alphabet -
alphaMax
final int alphaMax -
rangeLower
final int[] rangeLower -
rangeUpper
final int[] rangeUpper -
numRanges
int numRanges -
descriptions
LevenshteinAutomata.ParametricDescription[] descriptions
-
-
Constructor Details
-
LevenshteinAutomata
Create a new LevenshteinAutomata for some input String. Optionally count transpositions as a primitive edit. -
LevenshteinAutomata
public LevenshteinAutomata(int[] word, int alphaMax, boolean withTranspositions) Expert: specify a custom maximum possible symbol (alphaMax); default is Character.MAX_CODE_POINT.
-
-
Method Details
-
codePoints
-
toAutomaton
Compute a DFA that accepts all strings within an edit distance ofn
.All automata have the following properties:
- They are deterministic (DFA).
- There are no transitions to dead states.
- They are not minimal (some transitions could be combined).
-
toAutomaton
Compute a DFA that accepts all strings within an edit distance ofn
, matching the specified exact prefix.All automata have the following properties:
- They are deterministic (DFA).
- There are no transitions to dead states.
- They are not minimal (some transitions could be combined).
-
getVector
int getVector(int x, int pos, int end) Get the characteristic vectorX(x, V)
where V issubstring(pos, end)
-