Class NRTSuggester
- java.lang.Object
-
- org.apache.lucene.search.suggest.document.NRTSuggester
-
- All Implemented Interfaces:
Accountable
public final class NRTSuggester extends java.lang.Object implements Accountable
NRTSuggester executes Top N search on a weighted FST specified by a
CompletionScorer
See
lookup(CompletionScorer, Bits, TopSuggestDocsCollector)
for more implementation details.FST Format:
- Input: analyzed forms of input terms
- Output: Pair<Long, BytesRef> containing weight, surface form and docID
NOTE:
- having too many deletions or using a very restrictive filter can make the search inadmissible due to
over-pruning of potential paths. See
CompletionScorer.accept(int, Bits)
- when matched documents are arbitrarily filtered (
CompletionScorer.filtered
set totrue
, it is assumed that the filter will roughly filter out half the number of documents that match the provided automaton - lookup performance will degrade as more accepted completions lead to filtered out documents
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static class
NRTSuggester.PayLoadProcessor
Helper to encode/decode payload (surface + PAYLOAD_SEP + docID) outputprivate static class
NRTSuggester.ScoringPathComparator
Compares partial completion paths usingCompletionScorer.score(float, float)
, breaks ties comparing path inputs
-
Field Summary
Fields Modifier and Type Field Description private FST<PairOutputs.Pair<java.lang.Long,BytesRef>>
fst
FST: input is the analyzed form, with a null byte between terms and a NRTSuggesterBuilder.END_BYTE
to denote the end of the input weight is a long surface is the original, unanalyzed form followed by the docIDprivate static long
MAX_TOP_N_QUEUE_SIZE
Maximum queue depth for TopNSearcher NOTE: value should be <= Integer.MAX_VALUEprivate int
maxAnalyzedPathsPerOutput
Highest number of analyzed paths we saw for any single input surface form.private int
payloadSep
Separator used between surface form and its docID in the FST output
-
Constructor Summary
Constructors Modifier Constructor Description private
NRTSuggester(FST<PairOutputs.Pair<java.lang.Long,BytesRef>> fst, int maxAnalyzedPathsPerOutput, int payloadSep)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static double
calculateLiveDocRatio(int numDocs, int maxDocs)
(package private) static long
decode(long output)
(package private) static long
encode(long input)
java.util.Collection<Accountable>
getChildResources()
Returns nested resources of this class.private static java.util.Comparator<PairOutputs.Pair<java.lang.Long,BytesRef>>
getComparator()
private int
getMaxTopNSearcherQueueSize(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled)
Simple heuristics to try to avoid over-pruning potential suggestions by the TopNSearcher.static NRTSuggester
load(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode)
void
lookup(CompletionScorer scorer, Bits acceptDocs, TopSuggestDocsCollector collector)
Collects at mostTopSuggestDocsCollector.getCountToCollect()
completions that match the providedCompletionScorer
.long
ramBytesUsed()
Return the memory usage of this object in bytes.private static boolean
shouldLoadFSTOffHeap(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode)
-
-
-
Field Detail
-
fst
private final FST<PairOutputs.Pair<java.lang.Long,BytesRef>> fst
FST: input is the analyzed form, with a null byte between terms and a NRTSuggesterBuilder.END_BYTE
to denote the end of the input weight is a long surface is the original, unanalyzed form followed by the docID
-
maxAnalyzedPathsPerOutput
private final int maxAnalyzedPathsPerOutput
Highest number of analyzed paths we saw for any single input surface form. This can be > 1, when index analyzer creates graphs or if multiple surface form(s) yields the same analyzed form
-
payloadSep
private final int payloadSep
Separator used between surface form and its docID in the FST output
-
MAX_TOP_N_QUEUE_SIZE
private static final long MAX_TOP_N_QUEUE_SIZE
Maximum queue depth for TopNSearcher NOTE: value should be <= Integer.MAX_VALUE- See Also:
- Constant Field Values
-
-
Constructor Detail
-
NRTSuggester
private NRTSuggester(FST<PairOutputs.Pair<java.lang.Long,BytesRef>> fst, int maxAnalyzedPathsPerOutput, int payloadSep)
-
-
Method Detail
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
getChildResources
public java.util.Collection<Accountable> getChildResources()
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- Specified by:
getChildResources
in interfaceAccountable
- See Also:
Accountables
-
lookup
public void lookup(CompletionScorer scorer, Bits acceptDocs, TopSuggestDocsCollector collector) throws java.io.IOException
Collects at mostTopSuggestDocsCollector.getCountToCollect()
completions that match the providedCompletionScorer
.The
CompletionScorer.automaton
is intersected with thefst
.CompletionScorer.weight
is used to compute boosts and/or extract context for each matched partial paths. A top N search is executed onfst
seeded with the matched partial paths. Upon reaching a completed path,CompletionScorer.accept(int, Bits)
andCompletionScorer.score(float, float)
is used on the document id, index weight and query boost to filter and score the entry, before being collected viaTopSuggestDocsCollector.collect(int, CharSequence, CharSequence, float)
- Throws:
java.io.IOException
-
getComparator
private static java.util.Comparator<PairOutputs.Pair<java.lang.Long,BytesRef>> getComparator()
-
getMaxTopNSearcherQueueSize
private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled)
Simple heuristics to try to avoid over-pruning potential suggestions by the TopNSearcher. Since suggestion entries can be rejected if they belong to a deleted document, the length of the TopNSearcher queue has to be increased by some factor, to account for the filtered out suggestions. This heuristic will try to make the searcher admissible, but the search can still lead to over-pruningIf a
filter
is applied, the queue size is increased by half the number of live documents.The maximum queue size is
MAX_TOP_N_QUEUE_SIZE
-
calculateLiveDocRatio
private static double calculateLiveDocRatio(int numDocs, int maxDocs)
-
shouldLoadFSTOffHeap
private static boolean shouldLoadFSTOffHeap(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode)
-
load
public static NRTSuggester load(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode) throws java.io.IOException
- Throws:
java.io.IOException
-
encode
static long encode(long input)
-
decode
static long decode(long output)
-
-