Package org.apache.lucene.util.automaton
Class LevenshteinAutomata
- java.lang.Object
-
- org.apache.lucene.util.automaton.LevenshteinAutomata
-
public class LevenshteinAutomata extends java.lang.Object
Class to construct DFAs that match a word within some edit distance.Implements the algorithm described in: Schulz and Mihov: Fast String Correction with Levenshtein Automata
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static class
LevenshteinAutomata.ParametricDescription
A ParametricDescription describes the structure of a Levenshtein DFA for some degree n.
-
Field Summary
Fields Modifier and Type Field Description (package private) int[]
alphabet
(package private) int
alphaMax
(package private) LevenshteinAutomata.ParametricDescription[]
descriptions
static int
MAXIMUM_SUPPORTED_DISTANCE
Maximum edit distance this class can generate an automaton for.(package private) int
numRanges
(package private) int[]
rangeLower
(package private) int[]
rangeUpper
(package private) int[]
word
-
Constructor Summary
Constructors Constructor Description LevenshteinAutomata(int[] word, int alphaMax, boolean withTranspositions)
Expert: specify a custom maximum possible symbol (alphaMax); default is Character.MAX_CODE_POINT.LevenshteinAutomata(java.lang.String input, boolean withTranspositions)
Create a new LevenshteinAutomata for some input String.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static int[]
codePoints(java.lang.String input)
(package private) int
getVector(int x, int pos, int end)
Get the characteristic vectorX(x, V)
where V issubstring(pos, end)
Automaton
toAutomaton(int n)
Compute a DFA that accepts all strings within an edit distance ofn
.Automaton
toAutomaton(int n, java.lang.String prefix)
Compute a DFA that accepts all strings within an edit distance ofn
, matching the specified exact prefix.
-
-
-
Field Detail
-
MAXIMUM_SUPPORTED_DISTANCE
public static final int MAXIMUM_SUPPORTED_DISTANCE
Maximum edit distance this class can generate an automaton for.- See Also:
- Constant Field Values
-
word
final int[] word
-
alphabet
final int[] alphabet
-
alphaMax
final int alphaMax
-
rangeLower
final int[] rangeLower
-
rangeUpper
final int[] rangeUpper
-
numRanges
int numRanges
-
descriptions
LevenshteinAutomata.ParametricDescription[] descriptions
-
-
Constructor Detail
-
LevenshteinAutomata
public LevenshteinAutomata(java.lang.String input, boolean withTranspositions)
Create a new LevenshteinAutomata for some input String. Optionally count transpositions as a primitive edit.
-
LevenshteinAutomata
public LevenshteinAutomata(int[] word, int alphaMax, boolean withTranspositions)
Expert: specify a custom maximum possible symbol (alphaMax); default is Character.MAX_CODE_POINT.
-
-
Method Detail
-
codePoints
private static int[] codePoints(java.lang.String input)
-
toAutomaton
public Automaton toAutomaton(int n)
Compute a DFA that accepts all strings within an edit distance ofn
.All automata have the following properties:
- They are deterministic (DFA).
- There are no transitions to dead states.
- They are not minimal (some transitions could be combined).
-
toAutomaton
public Automaton toAutomaton(int n, java.lang.String prefix)
Compute a DFA that accepts all strings within an edit distance ofn
, matching the specified exact prefix.All automata have the following properties:
- They are deterministic (DFA).
- There are no transitions to dead states.
- They are not minimal (some transitions could be combined).
-
getVector
int getVector(int x, int pos, int end)
Get the characteristic vectorX(x, V)
where V issubstring(pos, end)
-
-