public final class CJKBigramFilter
extends org.apache.lucene.analysis.TokenFilter
CJK types are set by these tokenizers, but you can also use
CJKBigramFilter(TokenStream, int)
to explicitly control which
of the CJK scripts are turned into bigrams.
In all cases, all non-CJK input is passed thru unmodified.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DOUBLE_TYPE
when we emit a bigram, its then marked as this type
|
static int |
HAN
bigram flag for Han Ideographs
|
static int |
HANGUL
bigram flag for Hangul
|
static int |
HIRAGANA
bigram flag for Hiragana
|
static int |
KATAKANA
bigram flag for Katakana
|
static java.lang.String |
SINGLE_TYPE
when we emit a unigram, its then marked as this type
|
Constructor and Description |
---|
CJKBigramFilter(org.apache.lucene.analysis.TokenStream in)
|
CJKBigramFilter(org.apache.lucene.analysis.TokenStream in,
int flags)
Create a new CJKBigramFilter, specifying which writing systems should be bigrammed.
|
Modifier and Type | Method and Description |
---|---|
boolean |
incrementToken() |
void |
reset() |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public static final int HAN
public static final int HIRAGANA
public static final int KATAKANA
public static final int HANGUL
public static final java.lang.String DOUBLE_TYPE
public static final java.lang.String SINGLE_TYPE
public CJKBigramFilter(org.apache.lucene.analysis.TokenStream in)
public boolean incrementToken() throws java.io.IOException
incrementToken
in class org.apache.lucene.analysis.TokenStream
java.io.IOException
public void reset() throws java.io.IOException
reset
in class org.apache.lucene.analysis.TokenFilter
java.io.IOException
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.