Package org.apache.lucene.util.fst
Class Util
- java.lang.Object
-
- org.apache.lucene.util.fst.Util
-
public final class Util extends java.lang.Object
Static helper methods.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Util.FSTPath<T>
Represents a path in TopNSearcher.static class
Util.Result<T>
Holds a single input (IntsRef) + output, returned byshortestPaths()
.private static class
Util.TieBreakByInputComparator<T>
Compares first by the provided comparator, and then tie breaks by path.input.static class
Util.TopNSearcher<T>
Utility class to find top N shortest paths from start point(s).static class
Util.TopResults<T>
Holds the results for a top N search usingUtil.TopNSearcher
-
Constructor Summary
Constructors Modifier Constructor Description private
Util()
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description (package private) static <T> int
binarySearch(FST<T> fst, FST.Arc<T> arc, int targetLabel)
Perform a binary search of Arcs encoded as a packed arrayprivate static void
emitDotState(java.io.Writer out, java.lang.String name, java.lang.String shape, java.lang.String color, java.lang.String label)
Emit a single state in thedot
language.static <T> T
get(FST<T> fst, BytesRef input)
Looks up the output for this input, or null if the input is not acceptedstatic <T> T
get(FST<T> fst, IntsRef input)
Looks up the output for this input, or null if the input is not accepted.static IntsRef
getByOutput(FST<java.lang.Long> fst, long targetOutput)
Deprecated.static IntsRef
getByOutput(FST<java.lang.Long> fst, long targetOutput, FST.BytesReader in, FST.Arc<java.lang.Long> arc, FST.Arc<java.lang.Long> scratchArc, IntsRefBuilder result)
Deprecated.private static java.lang.String
printableLabel(int label)
Ensures an arc's label is indeed printable (dot uses US-ASCII).static <T> FST.Arc<T>
readCeilArc(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
Reads the first arc greater or equal than the given label into the provided arc in place and returns it iff found, otherwise returnnull
.static <T> Util.TopResults<T>
shortestPaths(FST<T> fst, FST.Arc<T> fromNode, T startOutput, java.util.Comparator<T> comparator, int topN, boolean allowEmptyString)
Starting from node, find the top N min cost completions to a final node.static BytesRef
toBytesRef(IntsRef input, BytesRefBuilder scratch)
Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.static <T> void
toDot(FST<T> fst, java.io.Writer out, boolean sameRank, boolean labelStates)
Dumps anFST
to a GraphViz'sdot
language description for visualization.static IntsRef
toIntsRef(BytesRef input, IntsRefBuilder scratch)
Just takes unsigned byte values from the BytesRef and converts into an IntsRef.static IntsRef
toUTF16(java.lang.CharSequence s, IntsRefBuilder scratch)
Just maps each UTF16 unit (char) to the ints in an IntsRef.static IntsRef
toUTF32(char[] s, int offset, int length, IntsRefBuilder scratch)
Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.static IntsRef
toUTF32(java.lang.CharSequence s, IntsRefBuilder scratch)
Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
-
-
-
Method Detail
-
get
public static <T> T get(FST<T> fst, IntsRef input) throws java.io.IOException
Looks up the output for this input, or null if the input is not accepted.- Throws:
java.io.IOException
-
get
public static <T> T get(FST<T> fst, BytesRef input) throws java.io.IOException
Looks up the output for this input, or null if the input is not accepted- Throws:
java.io.IOException
-
getByOutput
@Deprecated public static IntsRef getByOutput(FST<java.lang.Long> fst, long targetOutput) throws java.io.IOException
Deprecated.Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending. This locates the input/output pair where the output is equal to the target, and will return null if that output does not exist.NOTE: this only works with
FST<Long>
, only works when the outputs are ascending in order with the inputs. For example, simple ordinals (0, 1, 2, ...), or file offsets (when appending to a file) fit this.- Throws:
java.io.IOException
-
getByOutput
@Deprecated public static IntsRef getByOutput(FST<java.lang.Long> fst, long targetOutput, FST.BytesReader in, FST.Arc<java.lang.Long> arc, FST.Arc<java.lang.Long> scratchArc, IntsRefBuilder result) throws java.io.IOException
Deprecated.Expert: likegetByOutput(FST, long)
except reusing BytesReader, initial and scratch Arc, and result.- Throws:
java.io.IOException
-
shortestPaths
public static <T> Util.TopResults<T> shortestPaths(FST<T> fst, FST.Arc<T> fromNode, T startOutput, java.util.Comparator<T> comparator, int topN, boolean allowEmptyString) throws java.io.IOException
Starting from node, find the top N min cost completions to a final node.- Throws:
java.io.IOException
-
toDot
public static <T> void toDot(FST<T> fst, java.io.Writer out, boolean sameRank, boolean labelStates) throws java.io.IOException
Dumps anFST
to a GraphViz'sdot
language description for visualization. Example of use:PrintWriter pw = new PrintWriter("out.dot"); Util.toDot(fst, pw, true, true); pw.close();
and then, from command line:dot -Tpng -o out.png out.dot
Note: larger FSTs (a few thousand nodes) won't even render, don't bother.
- Parameters:
sameRank
- Iftrue
, the resultingdot
file will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer.labelStates
- Iftrue
states will have labels equal to their offsets in their binary format. Expands the graph considerably.- Throws:
java.io.IOException
- See Also:
- graphviz project
-
emitDotState
private static void emitDotState(java.io.Writer out, java.lang.String name, java.lang.String shape, java.lang.String color, java.lang.String label) throws java.io.IOException
Emit a single state in thedot
language.- Throws:
java.io.IOException
-
printableLabel
private static java.lang.String printableLabel(int label)
Ensures an arc's label is indeed printable (dot uses US-ASCII).
-
toUTF16
public static IntsRef toUTF16(java.lang.CharSequence s, IntsRefBuilder scratch)
Just maps each UTF16 unit (char) to the ints in an IntsRef.
-
toUTF32
public static IntsRef toUTF32(java.lang.CharSequence s, IntsRefBuilder scratch)
Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
-
toUTF32
public static IntsRef toUTF32(char[] s, int offset, int length, IntsRefBuilder scratch)
Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
-
toIntsRef
public static IntsRef toIntsRef(BytesRef input, IntsRefBuilder scratch)
Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
-
toBytesRef
public static BytesRef toBytesRef(IntsRef input, BytesRefBuilder scratch)
Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.
-
readCeilArc
public static <T> FST.Arc<T> readCeilArc(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Reads the first arc greater or equal than the given label into the provided arc in place and returns it iff found, otherwise returnnull
.- Parameters:
label
- the label to ceil onfst
- the fst to operate onfollow
- the arc to follow reading the label fromarc
- the arc to read into in placein
- the fst'sFST.BytesReader
- Throws:
java.io.IOException
-
binarySearch
static <T> int binarySearch(FST<T> fst, FST.Arc<T> arc, int targetLabel) throws java.io.IOException
Perform a binary search of Arcs encoded as a packed array- Type Parameters:
T
- the output type of the FST- Parameters:
fst
- the FST from which to readarc
- the starting arc; sibling arcs greater than this will be searched. Usually the first arc in the array.targetLabel
- the label to search for- Returns:
- the index of the Arc having the target label, or if no Arc has the matching label,
-1 - idx)
, whereidx
is the index of the Arc with the next highest label, or the total number of arcs if the target label exceeds the maximum. - Throws:
java.io.IOException
- when the FST reader does
-
-