Uses of Class
org.apache.lucene.util.AttributeSource
-
Packages that use AttributeSource Package Description org.apache.lucene.analysis Text analysis.org.apache.lucene.analysis.ar Analyzer for Arabic.org.apache.lucene.analysis.bg Analyzer for Bulgarian.org.apache.lucene.analysis.bn Analyzer for Bengali Language.org.apache.lucene.analysis.boost Provides various convenience classes for creating boosts on Tokens.org.apache.lucene.analysis.br Analyzer for Brazilian Portuguese.org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.org.apache.lucene.analysis.ckb Analyzer for Sorani Kurdish.org.apache.lucene.analysis.cn.smart Analyzer for Simplified Chinese, which indexes words.org.apache.lucene.analysis.commongrams Construct n-grams for frequently occurring terms and phrases.org.apache.lucene.analysis.compound A filter that decomposes compound words you find in many Germanic languages into the word parts.org.apache.lucene.analysis.core Basic, general-purpose analysis components.org.apache.lucene.analysis.cz Analyzer for Czech.org.apache.lucene.analysis.de Analyzer for German.org.apache.lucene.analysis.el Analyzer for Greek.org.apache.lucene.analysis.en Analyzer for English.org.apache.lucene.analysis.es Analyzer for Spanish.org.apache.lucene.analysis.fa Analyzer for Persian.org.apache.lucene.analysis.fi Analyzer for Finnish.org.apache.lucene.analysis.fr Analyzer for French.org.apache.lucene.analysis.ga Analyzer for Irish.org.apache.lucene.analysis.gl Analyzer for Galician.org.apache.lucene.analysis.hi Analyzer for Hindi.org.apache.lucene.analysis.hu Analyzer for Hungarian.org.apache.lucene.analysis.hunspell Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.org.apache.lucene.analysis.icu Analysis components based on ICUorg.apache.lucene.analysis.icu.segmentation Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.org.apache.lucene.analysis.id Analyzer for Indonesian.org.apache.lucene.analysis.in Analyzer for Indian languages.org.apache.lucene.analysis.it Analyzer for Italian.org.apache.lucene.analysis.ja Analyzer for Japanese.org.apache.lucene.analysis.ko Analyzer for Korean.org.apache.lucene.analysis.lv Analyzer for Latvian.org.apache.lucene.analysis.minhash MinHash filtering (for LSH).org.apache.lucene.analysis.miscellaneous Miscellaneous Tokenstreams.org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters.org.apache.lucene.analysis.no Analyzer for Norwegian.org.apache.lucene.analysis.path Analysis components for path-like strings such as filenames.org.apache.lucene.analysis.pattern Set of components for pattern-based (regex) analysis.org.apache.lucene.analysis.payloads Provides various convenience classes for creating payloads on Tokens.org.apache.lucene.analysis.phonetic Analysis components for phonetic search.org.apache.lucene.analysis.pt Analyzer for Portuguese.org.apache.lucene.analysis.reverse Filter to reverse token text.org.apache.lucene.analysis.ru Analyzer for Russian.org.apache.lucene.analysis.shingle Word n-gram filters.org.apache.lucene.analysis.sinks org.apache.lucene.analysis.snowball TokenFilter
andAnalyzer
implementations that use Snowball stemmers.org.apache.lucene.analysis.sr Analyzer for Serbian.org.apache.lucene.analysis.standard Fast, general-purpose grammar-based tokenizerStandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.org.apache.lucene.analysis.stempel Stempel: Algorithmic Stemmerorg.apache.lucene.analysis.sv Analyzer for Swedish.org.apache.lucene.analysis.synonym Analysis components for Synonyms.org.apache.lucene.analysis.th Analyzer for Thai.org.apache.lucene.analysis.tr Analyzer for Turkish.org.apache.lucene.analysis.util Utility functions for text analysis.org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax.org.apache.lucene.codecs Codecs API: API for customization of the encoding and structure of the index.org.apache.lucene.codecs.uniformsplit.sharedterms Pluggable term index / block terms dictionary implementations.org.apache.lucene.document The logical representation of aDocument
for indexing and searching.org.apache.lucene.index Code to maintain and access indices.org.apache.lucene.monitor Monitoring frameworkorg.apache.lucene.search Code to search indices.org.apache.lucene.search.highlight Highlighting search terms.org.apache.lucene.search.join Support for index-time and query-time joins.org.apache.lucene.search.suggest.analyzing Analyzer based autosuggest.org.apache.lucene.search.suggest.document Support for document suggestionorg.apache.lucene.search.uhighlight The UnifiedHighlighter -- a flexible highlighter that can get offsets from postings, term vectors, or analysis.org.apache.lucene.util Some utility classes.org.apache.lucene.util.graph Utility classes for working with token streams as graphs. -
-
Uses of AttributeSource in org.apache.lucene.analysis
Subclasses of AttributeSource in org.apache.lucene.analysis Modifier and Type Class Description private static class
Analyzer.StringTokenStream
class
CachingTokenFilter
This class can be used if the token attributes of a TokenStream are intended to be consumed more than once.class
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.class
GraphTokenFilter
An abstract TokenFilter that exposes its input stream as a graph CallGraphTokenFilter.incrementBaseToken()
to move the root of the graph to the next position in the TokenStream,GraphTokenFilter.incrementGraphToken()
to move along the current graph, andGraphTokenFilter.incrementGraph()
to reset to the next graph based at the current root.class
LowerCaseFilter
Normalizes token text to lower case.class
StopFilter
Removes stop words from a token stream.class
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.class
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.class
TokenStream
Fields in org.apache.lucene.analysis declared as AttributeSource Modifier and Type Field Description (package private) AttributeSource
GraphTokenFilter.Token. attSource
Methods in org.apache.lucene.analysis with parameters of type AttributeSource Modifier and Type Method Description (package private) void
GraphTokenFilter.Token. reset(AttributeSource attSource)
Constructors in org.apache.lucene.analysis with parameters of type AttributeSource Constructor Description Token(AttributeSource attSource)
TokenStream(AttributeSource input)
A TokenStream that uses the same attributes as the supplied one. -
Uses of AttributeSource in org.apache.lucene.analysis.ar
Subclasses of AttributeSource in org.apache.lucene.analysis.ar Modifier and Type Class Description class
ArabicNormalizationFilter
ATokenFilter
that appliesArabicNormalizer
to normalize the orthography.class
ArabicStemFilter
ATokenFilter
that appliesArabicStemmer
to stem Arabic words.. -
Uses of AttributeSource in org.apache.lucene.analysis.bg
Subclasses of AttributeSource in org.apache.lucene.analysis.bg Modifier and Type Class Description class
BulgarianStemFilter
ATokenFilter
that appliesBulgarianStemmer
to stem Bulgarian words. -
Uses of AttributeSource in org.apache.lucene.analysis.bn
Subclasses of AttributeSource in org.apache.lucene.analysis.bn Modifier and Type Class Description class
BengaliNormalizationFilter
ATokenFilter
that appliesBengaliNormalizer
to normalize the orthography.class
BengaliStemFilter
ATokenFilter
that appliesBengaliStemmer
to stem Bengali words. -
Uses of AttributeSource in org.apache.lucene.analysis.boost
Subclasses of AttributeSource in org.apache.lucene.analysis.boost Modifier and Type Class Description class
DelimitedBoostTokenFilter
Characters before the delimiter are the "token", those after are the boost. -
Uses of AttributeSource in org.apache.lucene.analysis.br
Subclasses of AttributeSource in org.apache.lucene.analysis.br Modifier and Type Class Description class
BrazilianStemFilter
ATokenFilter
that appliesBrazilianStemmer
. -
Uses of AttributeSource in org.apache.lucene.analysis.cjk
Subclasses of AttributeSource in org.apache.lucene.analysis.cjk Modifier and Type Class Description class
CJKBigramFilter
Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.class
CJKWidthFilter
ATokenFilter
that normalizes CJK width differences: Folds fullwidth ASCII variants into the equivalent basic latin Folds halfwidth Katakana variants into the equivalent kana -
Uses of AttributeSource in org.apache.lucene.analysis.ckb
Subclasses of AttributeSource in org.apache.lucene.analysis.ckb Modifier and Type Class Description class
SoraniNormalizationFilter
ATokenFilter
that appliesSoraniNormalizer
to normalize the orthography.class
SoraniStemFilter
ATokenFilter
that appliesSoraniStemmer
to stem Sorani words. -
Uses of AttributeSource in org.apache.lucene.analysis.cn.smart
Subclasses of AttributeSource in org.apache.lucene.analysis.cn.smart Modifier and Type Class Description class
HMMChineseTokenizer
Tokenizer for Chinese or mixed Chinese-English text. -
Uses of AttributeSource in org.apache.lucene.analysis.commongrams
Subclasses of AttributeSource in org.apache.lucene.analysis.commongrams Modifier and Type Class Description class
CommonGramsFilter
Construct bigrams for frequently occurring terms while indexing.class
CommonGramsQueryFilter
Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are not a member of a bigram. -
Uses of AttributeSource in org.apache.lucene.analysis.compound
Subclasses of AttributeSource in org.apache.lucene.analysis.compound Modifier and Type Class Description class
CompoundWordTokenFilterBase
Base class for decomposition token filters.class
DictionaryCompoundWordTokenFilter
ATokenFilter
that decomposes compound words found in many Germanic languages.class
HyphenationCompoundWordTokenFilter
ATokenFilter
that decomposes compound words found in many Germanic languages. -
Uses of AttributeSource in org.apache.lucene.analysis.core
Subclasses of AttributeSource in org.apache.lucene.analysis.core Modifier and Type Class Description class
DecimalDigitFilter
Folds all Unicode digits in[:General_Category=Decimal_Number:]
to Basic Latin digits (0-9
).class
FlattenGraphFilter
Converts an incoming graph token stream, such as one fromSynonymGraphFilter
, into a flat form so that all nodes form a single linear chain with no side paths.class
KeywordTokenizer
Emits the entire input as a single token.class
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.class
LowerCaseFilter
Normalizes token text to lower case.class
StopFilter
Removes stop words from a token stream.class
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.class
UnicodeWhitespaceTokenizer
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.class
UpperCaseFilter
Normalizes token text to UPPER CASE.class
WhitespaceTokenizer
A tokenizer that divides text at whitespace characters as defined byCharacter.isWhitespace(int)
. -
Uses of AttributeSource in org.apache.lucene.analysis.cz
Subclasses of AttributeSource in org.apache.lucene.analysis.cz Modifier and Type Class Description class
CzechStemFilter
ATokenFilter
that appliesCzechStemmer
to stem Czech words. -
Uses of AttributeSource in org.apache.lucene.analysis.de
Subclasses of AttributeSource in org.apache.lucene.analysis.de Modifier and Type Class Description class
GermanLightStemFilter
ATokenFilter
that appliesGermanLightStemmer
to stem German words.class
GermanMinimalStemFilter
ATokenFilter
that appliesGermanMinimalStemmer
to stem German words.class
GermanNormalizationFilter
Normalizes German characters according to the heuristics of the German2 snowball algorithm.class
GermanStemFilter
ATokenFilter
that stems German words. -
Uses of AttributeSource in org.apache.lucene.analysis.el
Subclasses of AttributeSource in org.apache.lucene.analysis.el Modifier and Type Class Description class
GreekLowerCaseFilter
Normalizes token text to lower case, removes some Greek diacritics, and standardizes final sigma to sigma.class
GreekStemFilter
ATokenFilter
that appliesGreekStemmer
to stem Greek words. -
Uses of AttributeSource in org.apache.lucene.analysis.en
Subclasses of AttributeSource in org.apache.lucene.analysis.en Modifier and Type Class Description class
EnglishMinimalStemFilter
ATokenFilter
that appliesEnglishMinimalStemmer
to stem English words.class
EnglishPossessiveFilter
TokenFilter that removes possessives (trailing 's) from words.class
KStemFilter
A high-performance kstem filter for english.class
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm. -
Uses of AttributeSource in org.apache.lucene.analysis.es
Subclasses of AttributeSource in org.apache.lucene.analysis.es Modifier and Type Class Description class
SpanishLightStemFilter
ATokenFilter
that appliesSpanishLightStemmer
to stem Spanish words.class
SpanishMinimalStemFilter
ATokenFilter
that appliesSpanishMinimalStemmer
to stem Spanish words. -
Uses of AttributeSource in org.apache.lucene.analysis.fa
Subclasses of AttributeSource in org.apache.lucene.analysis.fa Modifier and Type Class Description class
PersianNormalizationFilter
ATokenFilter
that appliesPersianNormalizer
to normalize the orthography. -
Uses of AttributeSource in org.apache.lucene.analysis.fi
Subclasses of AttributeSource in org.apache.lucene.analysis.fi Modifier and Type Class Description class
FinnishLightStemFilter
ATokenFilter
that appliesFinnishLightStemmer
to stem Finnish words. -
Uses of AttributeSource in org.apache.lucene.analysis.fr
Subclasses of AttributeSource in org.apache.lucene.analysis.fr Modifier and Type Class Description class
FrenchLightStemFilter
ATokenFilter
that appliesFrenchLightStemmer
to stem French words.class
FrenchMinimalStemFilter
ATokenFilter
that appliesFrenchMinimalStemmer
to stem French words. -
Uses of AttributeSource in org.apache.lucene.analysis.ga
Subclasses of AttributeSource in org.apache.lucene.analysis.ga Modifier and Type Class Description class
IrishLowerCaseFilter
Normalises token text to lower case, handling t-prothesis and n-eclipsis (i.e., that 'nAthair' should become 'n-athair') -
Uses of AttributeSource in org.apache.lucene.analysis.gl
Subclasses of AttributeSource in org.apache.lucene.analysis.gl Modifier and Type Class Description class
GalicianMinimalStemFilter
ATokenFilter
that appliesGalicianMinimalStemmer
to stem Galician words.class
GalicianStemFilter
ATokenFilter
that appliesGalicianStemmer
to stem Galician words. -
Uses of AttributeSource in org.apache.lucene.analysis.hi
Subclasses of AttributeSource in org.apache.lucene.analysis.hi Modifier and Type Class Description class
HindiNormalizationFilter
ATokenFilter
that appliesHindiNormalizer
to normalize the orthography.class
HindiStemFilter
ATokenFilter
that appliesHindiStemmer
to stem Hindi words. -
Uses of AttributeSource in org.apache.lucene.analysis.hu
Subclasses of AttributeSource in org.apache.lucene.analysis.hu Modifier and Type Class Description class
HungarianLightStemFilter
ATokenFilter
that appliesHungarianLightStemmer
to stem Hungarian words. -
Uses of AttributeSource in org.apache.lucene.analysis.hunspell
Subclasses of AttributeSource in org.apache.lucene.analysis.hunspell Modifier and Type Class Description class
HunspellStemFilter
TokenFilter that uses hunspell affix rules and words to stem tokens. -
Uses of AttributeSource in org.apache.lucene.analysis.icu
Subclasses of AttributeSource in org.apache.lucene.analysis.icu Modifier and Type Class Description class
ICUFoldingFilter
A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.class
ICUNormalizer2Filter
Normalize token text with ICU'sNormalizer2
class
ICUTransformFilter
ATokenFilter
that transforms text with ICU. -
Uses of AttributeSource in org.apache.lucene.analysis.icu.segmentation
Subclasses of AttributeSource in org.apache.lucene.analysis.icu.segmentation Modifier and Type Class Description class
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/) -
Uses of AttributeSource in org.apache.lucene.analysis.id
Subclasses of AttributeSource in org.apache.lucene.analysis.id Modifier and Type Class Description class
IndonesianStemFilter
ATokenFilter
that appliesIndonesianStemmer
to stem Indonesian words. -
Uses of AttributeSource in org.apache.lucene.analysis.in
Subclasses of AttributeSource in org.apache.lucene.analysis.in Modifier and Type Class Description class
IndicNormalizationFilter
ATokenFilter
that appliesIndicNormalizer
to normalize text in Indian Languages. -
Uses of AttributeSource in org.apache.lucene.analysis.it
Subclasses of AttributeSource in org.apache.lucene.analysis.it Modifier and Type Class Description class
ItalianLightStemFilter
ATokenFilter
that appliesItalianLightStemmer
to stem Italian words. -
Uses of AttributeSource in org.apache.lucene.analysis.ja
Subclasses of AttributeSource in org.apache.lucene.analysis.ja Modifier and Type Class Description class
JapaneseBaseFormFilter
Replaces term text with theBaseFormAttribute
.class
JapaneseKatakanaStemFilter
ATokenFilter
that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).class
JapaneseNumberFilter
ATokenFilter
that normalizes Japanese numbers (kansūji) to regular Arabic decimal numbers in half-width characters.class
JapanesePartOfSpeechStopFilter
Removes tokens that match a set of part-of-speech tags.class
JapaneseReadingFormFilter
ATokenFilter
that replaces the term attribute with the reading of a token in either katakana or romaji form.class
JapaneseTokenizer
Tokenizer for Japanese that uses morphological analysis. -
Uses of AttributeSource in org.apache.lucene.analysis.ko
Subclasses of AttributeSource in org.apache.lucene.analysis.ko Modifier and Type Class Description class
KoreanNumberFilter
ATokenFilter
that normalizes Korean numbers to regular Arabic decimal numbers in half-width characters.class
KoreanPartOfSpeechStopFilter
Removes tokens that match a set of part-of-speech tags.class
KoreanReadingFormFilter
Replaces term text with theReadingAttribute
which is the Hangul transcription of Hanja characters.class
KoreanTokenizer
Tokenizer for Korean that uses morphological analysis. -
Uses of AttributeSource in org.apache.lucene.analysis.lv
Subclasses of AttributeSource in org.apache.lucene.analysis.lv Modifier and Type Class Description class
LatvianStemFilter
ATokenFilter
that appliesLatvianStemmer
to stem Latvian words. -
Uses of AttributeSource in org.apache.lucene.analysis.minhash
Subclasses of AttributeSource in org.apache.lucene.analysis.minhash Modifier and Type Class Description class
MinHashFilter
Generate min hash tokens from an incoming stream of tokens. -
Uses of AttributeSource in org.apache.lucene.analysis.miscellaneous
Subclasses of AttributeSource in org.apache.lucene.analysis.miscellaneous Modifier and Type Class Description class
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.class
CapitalizationFilter
A filter to apply normal capitalization rules to Tokens.class
CodepointCountFilter
Removes words that are too long or too short from the stream.class
ConcatenateGraphFilter
Concatenates/Joins every incoming token with a separator into one output token for every path through the token stream (which is a graph).class
ConcatenatingTokenStream
A TokenStream that takes an array of input TokenStreams as sources, and concatenates them together.class
ConditionalTokenFilter
Allows skipping TokenFilters based on the current set of attributes.private class
ConditionalTokenFilter.OneTimeWrapper
class
DateRecognizerFilter
Filters all tokens that cannot be parsed to a date, using the providedDateFormat
.class
DelimitedTermFrequencyTokenFilter
Characters before the delimiter are the "token", the textual integer after is the term frequency.class
EmptyTokenStream
An always exhausted token stream.class
FingerprintFilter
Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens.class
FixBrokenOffsetsFilter
Deprecated.Fix the token filters that create broken offsets in the first place.class
HyphenatedWordsFilter
When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.class
KeepWordFilter
A TokenFilter that only keeps tokens with text contained in the required words.class
KeywordMarkerFilter
Marks terms as keywords via theKeywordAttribute
.class
KeywordRepeatFilter
This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once withKeywordAttribute.setKeyword(boolean)
set totrue
and once set tofalse
.class
LengthFilter
Removes words that are too long or too short from the stream.class
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.class
LimitTokenOffsetFilter
Lets all tokens pass through until it sees one with a start offset <= a configured limit, which won't pass and ends the stream.class
LimitTokenPositionFilter
This TokenFilter limits its emitted tokens to those with positions that are not greater than the configured limit.class
PatternKeywordMarkerFilter
Marks terms as keywords via theKeywordAttribute
.class
ProtectedTermFilter
A ConditionalTokenFilter that only applies its wrapped filters to tokens that are not contained in a protected set.class
RemoveDuplicatesTokenFilter
A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.class
ScandinavianFoldingFilter
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.class
ScandinavianNormalizationFilter
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.class
SetKeywordMarkerFilter
Marks terms as keywords via theKeywordAttribute
.class
StemmerOverrideFilter
Provides the ability to override anyKeywordAttribute
aware stemmer with custom dictionary-based stemming.class
TrimFilter
Trims leading and trailing whitespace from Tokens in the stream.class
TruncateTokenFilter
A token filter for truncating the terms into a specific length.class
TypeAsSynonymFilter
Adds theTypeAttribute.type()
as a synonym, i.e.class
WordDelimiterFilter
Deprecated.UseWordDelimiterGraphFilter
instead: it produces a correct token graph so that e.g.class
WordDelimiterGraphFilter
Splits words into subwords and performs optional transformations on subword groups, producing a correct token graph so that e.g.Methods in org.apache.lucene.analysis.miscellaneous that return AttributeSource Modifier and Type Method Description private static AttributeSource
ConcatenatingTokenStream. combineSources(TokenStream... sources)
Constructors in org.apache.lucene.analysis.miscellaneous with parameters of type AttributeSource Constructor Description OneTimeWrapper(AttributeSource attributeSource)
-
Uses of AttributeSource in org.apache.lucene.analysis.ngram
Subclasses of AttributeSource in org.apache.lucene.analysis.ngram Modifier and Type Class Description class
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).class
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).class
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s).class
NGramTokenizer
Tokenizes the input into n-grams of the given size(s). -
Uses of AttributeSource in org.apache.lucene.analysis.no
Subclasses of AttributeSource in org.apache.lucene.analysis.no Modifier and Type Class Description class
NorwegianLightStemFilter
ATokenFilter
that appliesNorwegianLightStemmer
to stem Norwegian words.class
NorwegianMinimalStemFilter
ATokenFilter
that appliesNorwegianMinimalStemmer
to stem Norwegian words. -
Uses of AttributeSource in org.apache.lucene.analysis.path
Subclasses of AttributeSource in org.apache.lucene.analysis.path Modifier and Type Class Description class
PathHierarchyTokenizer
Tokenizer for path-like hierarchies.class
ReversePathHierarchyTokenizer
Tokenizer for domain-like hierarchies. -
Uses of AttributeSource in org.apache.lucene.analysis.pattern
Subclasses of AttributeSource in org.apache.lucene.analysis.pattern Modifier and Type Class Description class
PatternCaptureGroupTokenFilter
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or more patterns.class
PatternReplaceFilter
A TokenFilter which applies a Pattern to each token in the stream, replacing match occurrences with the specified replacement string.class
PatternTokenizer
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.class
SimplePatternSplitTokenizer
class
SimplePatternTokenizer
-
Uses of AttributeSource in org.apache.lucene.analysis.payloads
Subclasses of AttributeSource in org.apache.lucene.analysis.payloads Modifier and Type Class Description class
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.class
NumericPayloadTokenFilter
Assigns a payload to a token based on theTypeAttribute
class
TokenOffsetPayloadTokenFilter
Adds theOffsetAttribute.startOffset()
andOffsetAttribute.endOffset()
First 4 bytes are the startclass
TypeAsPayloadTokenFilter
Makes theTypeAttribute
a payload. -
Uses of AttributeSource in org.apache.lucene.analysis.phonetic
Subclasses of AttributeSource in org.apache.lucene.analysis.phonetic Modifier and Type Class Description class
BeiderMorseFilter
TokenFilter for Beider-Morse phonetic encoding.class
DaitchMokotoffSoundexFilter
Create tokens for phonetic matches based on Daitch–Mokotoff Soundex.class
DoubleMetaphoneFilter
Filter for DoubleMetaphone (supporting secondary codes)class
PhoneticFilter
Create tokens for phonetic matches. -
Uses of AttributeSource in org.apache.lucene.analysis.pt
Subclasses of AttributeSource in org.apache.lucene.analysis.pt Modifier and Type Class Description class
PortugueseLightStemFilter
ATokenFilter
that appliesPortugueseLightStemmer
to stem Portuguese words.class
PortugueseMinimalStemFilter
ATokenFilter
that appliesPortugueseMinimalStemmer
to stem Portuguese words.class
PortugueseStemFilter
ATokenFilter
that appliesPortugueseStemmer
to stem Portuguese words. -
Uses of AttributeSource in org.apache.lucene.analysis.reverse
Subclasses of AttributeSource in org.apache.lucene.analysis.reverse Modifier and Type Class Description class
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc". -
Uses of AttributeSource in org.apache.lucene.analysis.ru
Subclasses of AttributeSource in org.apache.lucene.analysis.ru Modifier and Type Class Description class
RussianLightStemFilter
ATokenFilter
that appliesRussianLightStemmer
to stem Russian words. -
Uses of AttributeSource in org.apache.lucene.analysis.shingle
Subclasses of AttributeSource in org.apache.lucene.analysis.shingle Modifier and Type Class Description class
FixedShingleFilter
A FixedShingleFilter constructs shingles (token n-grams) from a token stream.class
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream.Fields in org.apache.lucene.analysis.shingle declared as AttributeSource Modifier and Type Field Description (package private) AttributeSource
ShingleFilter.InputWindowToken. attSource
private AttributeSource
ShingleFilter. nextInputStreamToken
When the next input stream token has a position increment greater than one, it is stored in this field until sufficient filler tokens have been inserted to account for the position increment.Constructors in org.apache.lucene.analysis.shingle with parameters of type AttributeSource Constructor Description InputWindowToken(AttributeSource attSource)
-
Uses of AttributeSource in org.apache.lucene.analysis.sinks
Subclasses of AttributeSource in org.apache.lucene.analysis.sinks Modifier and Type Class Description class
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states that have already been analyzed.static class
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee.Constructors in org.apache.lucene.analysis.sinks with parameters of type AttributeSource Constructor Description SinkTokenStream(AttributeSource source, TeeSinkTokenFilter.States cachedStates)
-
Uses of AttributeSource in org.apache.lucene.analysis.snowball
Subclasses of AttributeSource in org.apache.lucene.analysis.snowball Modifier and Type Class Description class
SnowballFilter
A filter that stems words using a Snowball-generated stemmer. -
Uses of AttributeSource in org.apache.lucene.analysis.sr
Subclasses of AttributeSource in org.apache.lucene.analysis.sr Modifier and Type Class Description class
SerbianNormalizationFilter
Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.class
SerbianNormalizationRegularFilter
Normalizes Serbian Cyrillic to Latin. -
Uses of AttributeSource in org.apache.lucene.analysis.standard
Subclasses of AttributeSource in org.apache.lucene.analysis.standard Modifier and Type Class Description class
ClassicFilter
Normalizes tokens extracted withClassicTokenizer
.class
ClassicTokenizer
A grammar-based tokenizer constructed with JFlexclass
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.class
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs. -
Uses of AttributeSource in org.apache.lucene.analysis.stempel
Subclasses of AttributeSource in org.apache.lucene.analysis.stempel Modifier and Type Class Description class
StempelFilter
Transforms the token stream as per the stemming algorithm. -
Uses of AttributeSource in org.apache.lucene.analysis.sv
Subclasses of AttributeSource in org.apache.lucene.analysis.sv Modifier and Type Class Description class
SwedishLightStemFilter
ATokenFilter
that appliesSwedishLightStemmer
to stem Swedish words. -
Uses of AttributeSource in org.apache.lucene.analysis.synonym
Subclasses of AttributeSource in org.apache.lucene.analysis.synonym Modifier and Type Class Description class
SynonymFilter
Deprecated.UseSynonymGraphFilter
instead, but be sure to also useFlattenGraphFilter
at index time (not at search time) as well.class
SynonymGraphFilter
Applies single- or multi-token synonyms from aSynonymMap
to an incomingTokenStream
, producing a fully correct graph output. -
Uses of AttributeSource in org.apache.lucene.analysis.th
Subclasses of AttributeSource in org.apache.lucene.analysis.th Modifier and Type Class Description class
ThaiTokenizer
Tokenizer that useBreakIterator
to tokenize Thai text. -
Uses of AttributeSource in org.apache.lucene.analysis.tr
Subclasses of AttributeSource in org.apache.lucene.analysis.tr Modifier and Type Class Description class
ApostropheFilter
Strips all characters after an apostrophe (including the apostrophe itself).class
TurkishLowerCaseFilter
Normalizes Turkish token text to lower case. -
Uses of AttributeSource in org.apache.lucene.analysis.util
Subclasses of AttributeSource in org.apache.lucene.analysis.util Modifier and Type Class Description class
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.class
ElisionFilter
Removes elisions from aTokenStream
.class
SegmentingTokenizerBase
Breaks text into sentences with aBreakIterator
and allows subclasses to decompose these sentences into words. -
Uses of AttributeSource in org.apache.lucene.analysis.wikipedia
Subclasses of AttributeSource in org.apache.lucene.analysis.wikipedia Modifier and Type Class Description class
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax. -
Uses of AttributeSource in org.apache.lucene.codecs
Methods in org.apache.lucene.codecs that return AttributeSource Modifier and Type Method Description AttributeSource
DocValuesConsumer.MergedTermsEnum. attributes()
-
Uses of AttributeSource in org.apache.lucene.codecs.uniformsplit.sharedterms
Methods in org.apache.lucene.codecs.uniformsplit.sharedterms that return AttributeSource Modifier and Type Method Description AttributeSource
STMergingTermsEnum. attributes()
-
Uses of AttributeSource in org.apache.lucene.document
Subclasses of AttributeSource in org.apache.lucene.document Modifier and Type Class Description private static class
FeatureField.FeatureTokenStream
private static class
Field.BinaryTokenStream
private static class
Field.StringTokenStream
-
Uses of AttributeSource in org.apache.lucene.index
Fields in org.apache.lucene.index declared as AttributeSource Modifier and Type Field Description (package private) AttributeSource
FieldInvertState. attributeSource
private AttributeSource
BaseTermsEnum. atts
Methods in org.apache.lucene.index that return AttributeSource Modifier and Type Method Description AttributeSource
BaseTermsEnum. attributes()
AttributeSource
FilteredTermsEnum. attributes()
Returns the related attributes, the returnedAttributeSource
is shared with the delegateTermsEnum
.AttributeSource
FilterLeafReader.FilterTermsEnum. attributes()
abstract AttributeSource
TermsEnum. attributes()
Returns the related attributes.AttributeSource
FieldInvertState. getAttributeSource()
Returns theAttributeSource
from theTokenStream
that provided the indexed tokens for this field.Methods in org.apache.lucene.index with parameters of type AttributeSource Modifier and Type Method Description (package private) void
FieldInvertState. setAttributeSource(AttributeSource attributeSource)
Sets attributeSource to a new instance. -
Uses of AttributeSource in org.apache.lucene.monitor
Subclasses of AttributeSource in org.apache.lucene.monitor Modifier and Type Class Description (package private) class
SuffixingNGramTokenFilter
(package private) class
TermsEnumTokenStream
A TokenStream created from aTermsEnum
-
Uses of AttributeSource in org.apache.lucene.search
Fields in org.apache.lucene.search declared as AttributeSource Modifier and Type Field Description AttributeSource
TermCollectingRewrite.TermCollector. attributes
attributes used for communication with the enumprivate AttributeSource
FuzzyTermsEnum. atts
Methods in org.apache.lucene.search that return AttributeSource Modifier and Type Method Description AttributeSource
FuzzyTermsEnum. attributes()
Methods in org.apache.lucene.search with parameters of type AttributeSource Modifier and Type Method Description protected TermsEnum
AutomatonQuery. getTermsEnum(Terms terms, AttributeSource atts)
protected TermsEnum
FuzzyQuery. getTermsEnum(Terms terms, AttributeSource atts)
protected abstract TermsEnum
MultiTermQuery. getTermsEnum(Terms terms, AttributeSource atts)
Construct the enumeration to be used, expanding the pattern term.protected TermsEnum
MultiTermQuery.RewriteMethod. getTermsEnum(MultiTermQuery query, Terms terms, AttributeSource atts)
Returns theMultiTermQuery
sTermsEnum
Constructors in org.apache.lucene.search with parameters of type AttributeSource Constructor Description FuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, int maxEdits, int prefixLength, boolean transpositions)
Constructor for enumeration of all terms from specifiedreader
which share a prefix of lengthprefixLength
withterm
and which have at mostmaxEdits
edits.FuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, java.util.function.Supplier<FuzzyAutomatonBuilder> automatonBuilder)
-
Uses of AttributeSource in org.apache.lucene.search.highlight
Subclasses of AttributeSource in org.apache.lucene.search.highlight Modifier and Type Class Description (package private) class
LimitTokenOffsetFilter
This is a simplified version of org.apache.lucene.analysis.miscellaneous.LimitTokenOffsetFilter to prevent a dependency on analyzers-common.jar.class
OffsetLimitTokenFilter
This TokenFilter limits the number of tokens while indexing by adding up the current offset.class
TokenStreamFromTermVector
TokenStream created from a term vector field. -
Uses of AttributeSource in org.apache.lucene.search.join
Methods in org.apache.lucene.search.join with parameters of type AttributeSource Modifier and Type Method Description protected TermsEnum
TermsQuery. getTermsEnum(Terms terms, AttributeSource atts)
-
Uses of AttributeSource in org.apache.lucene.search.suggest.analyzing
Subclasses of AttributeSource in org.apache.lucene.search.suggest.analyzing Modifier and Type Class Description class
SuggestStopFilter
LikeStopFilter
except it will not remove the last token if that token was not followed by some token separator. -
Uses of AttributeSource in org.apache.lucene.search.suggest.document
Subclasses of AttributeSource in org.apache.lucene.search.suggest.document Modifier and Type Class Description class
CompletionTokenStream
AConcatenateGraphFilter
but we can set the payload and provide access to config options.private static class
ContextSuggestField.PrefixTokenFilter
TheContextSuggestField.PrefixTokenFilter
wraps aTokenStream
and adds a set prefixes ahead. -
Uses of AttributeSource in org.apache.lucene.search.uhighlight
Subclasses of AttributeSource in org.apache.lucene.search.uhighlight Modifier and Type Class Description private static class
AnalysisOffsetStrategy.MultiValueTokenStream
Wraps anAnalyzer
and string text that represents multiple values delimited by a specified character. -
Uses of AttributeSource in org.apache.lucene.util
Methods in org.apache.lucene.util that return AttributeSource Modifier and Type Method Description AttributeSource
AttributeSource. cloneAttributes()
Performs a clone of allAttributeImpl
instances returned in a newAttributeSource
instance.Methods in org.apache.lucene.util with parameters of type AttributeSource Modifier and Type Method Description void
AttributeSource. copyTo(AttributeSource target)
Copies the contents of thisAttributeSource
to the given targetAttributeSource
.Constructors in org.apache.lucene.util with parameters of type AttributeSource Constructor Description AttributeSource(AttributeSource input)
An AttributeSource that uses the same attributes as the supplied one. -
Uses of AttributeSource in org.apache.lucene.util.graph
Subclasses of AttributeSource in org.apache.lucene.util.graph Modifier and Type Class Description private class
GraphTokenStreamFiniteStrings.FiniteStringsTokenStream
Fields in org.apache.lucene.util.graph declared as AttributeSource Modifier and Type Field Description private AttributeSource[]
GraphTokenStreamFiniteStrings. tokens
Methods in org.apache.lucene.util.graph that return types with arguments of type AttributeSource Modifier and Type Method Description java.util.List<AttributeSource>
GraphTokenStreamFiniteStrings. getTerms(int state)
Returns the list of tokens that start at the provided state
-