Class BinaryDictionary
- java.lang.Object
-
- org.apache.lucene.analysis.ja.dict.BinaryDictionary
-
- All Implemented Interfaces:
Dictionary
- Direct Known Subclasses:
TokenInfoDictionary
,UnknownDictionary
public abstract class BinaryDictionary extends java.lang.Object implements Dictionary
Base class for a binary-encoded in-memory dictionary.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BinaryDictionary.ResourceScheme
Used to specify where (dictionary) resources get loaded from.
-
Field Summary
Fields Modifier and Type Field Description private java.nio.ByteBuffer
buffer
static java.lang.String
DICT_FILENAME_SUFFIX
static java.lang.String
DICT_HEADER
static int
HAS_BASEFORM
flag that the entry has baseform data.static int
HAS_PRONUNCIATION
flag that the entry has pronunciation data.static int
HAS_READING
flag that the entry has reading data.private java.lang.String[]
inflFormDict
private java.lang.String[]
inflTypeDict
private java.lang.String[]
posDict
static java.lang.String
POSDICT_FILENAME_SUFFIX
static java.lang.String
POSDICT_HEADER
private java.lang.String
resourcePath
private BinaryDictionary.ResourceScheme
resourceScheme
private int[]
targetMap
static java.lang.String
TARGETMAP_FILENAME_SUFFIX
static java.lang.String
TARGETMAP_HEADER
private int[]
targetMapOffsets
static int
VERSION
-
Fields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
BinaryDictionary()
protected
BinaryDictionary(BinaryDictionary.ResourceScheme resourceScheme, java.lang.String resourcePath)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static int
baseFormOffset(int wordId)
java.lang.String
getBaseForm(int wordId, char[] surfaceForm, int off, int len)
Get base form of wordstatic java.io.InputStream
getClassResource(java.lang.Class<?> clazz, java.lang.String suffix)
private static java.io.InputStream
getClassResource(java.lang.String path)
java.lang.String
getInflectionForm(int wordId)
Get inflection form of tokensjava.lang.String
getInflectionType(int wordId)
Get inflection type of tokensint
getLeftId(int wordId)
Get left id of specified wordjava.lang.String
getPartOfSpeech(int wordId)
Get Part-Of-Speech of tokensjava.lang.String
getPronunciation(int wordId, char[] surface, int off, int len)
Get pronunciation of tokensjava.lang.String
getReading(int wordId, char[] surface, int off, int len)
Get reading of tokensprotected java.io.InputStream
getResource(java.lang.String suffix)
static java.io.InputStream
getResource(BinaryDictionary.ResourceScheme scheme, java.lang.String path)
int
getRightId(int wordId)
Get right id of specified wordint
getWordCost(int wordId)
Get word cost of specified wordprivate boolean
hasBaseFormData(int wordId)
private boolean
hasPronunciationData(int wordId)
private boolean
hasReadingData(int wordId)
void
lookupWordIds(int sourceId, IntsRef ref)
private int
pronunciationOffset(int wordId)
private int
readingOffset(int wordId)
private java.lang.String
readString(int offset, int length, boolean kana)
-
-
-
Field Detail
-
DICT_FILENAME_SUFFIX
public static final java.lang.String DICT_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
TARGETMAP_FILENAME_SUFFIX
public static final java.lang.String TARGETMAP_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
POSDICT_FILENAME_SUFFIX
public static final java.lang.String POSDICT_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
DICT_HEADER
public static final java.lang.String DICT_HEADER
- See Also:
- Constant Field Values
-
TARGETMAP_HEADER
public static final java.lang.String TARGETMAP_HEADER
- See Also:
- Constant Field Values
-
POSDICT_HEADER
public static final java.lang.String POSDICT_HEADER
- See Also:
- Constant Field Values
-
VERSION
public static final int VERSION
- See Also:
- Constant Field Values
-
resourceScheme
private final BinaryDictionary.ResourceScheme resourceScheme
-
resourcePath
private final java.lang.String resourcePath
-
buffer
private final java.nio.ByteBuffer buffer
-
targetMapOffsets
private final int[] targetMapOffsets
-
targetMap
private final int[] targetMap
-
posDict
private final java.lang.String[] posDict
-
inflTypeDict
private final java.lang.String[] inflTypeDict
-
inflFormDict
private final java.lang.String[] inflFormDict
-
HAS_BASEFORM
public static final int HAS_BASEFORM
flag that the entry has baseform data. otherwise it's not inflected (same as surface form)- See Also:
- Constant Field Values
-
HAS_READING
public static final int HAS_READING
flag that the entry has reading data. otherwise reading is surface form converted to katakana- See Also:
- Constant Field Values
-
HAS_PRONUNCIATION
public static final int HAS_PRONUNCIATION
flag that the entry has pronunciation data. otherwise pronunciation is the reading- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BinaryDictionary
protected BinaryDictionary() throws java.io.IOException
- Throws:
java.io.IOException
-
BinaryDictionary
protected BinaryDictionary(BinaryDictionary.ResourceScheme resourceScheme, java.lang.String resourcePath) throws java.io.IOException
- Parameters:
resourceScheme
- - scheme for loading resources (FILE or CLASSPATH).resourcePath
- - where to load resources (dictionaries) from. If null, with CLASSPATH scheme only, use this class's name as the path.- Throws:
java.io.IOException
-
-
Method Detail
-
getResource
protected final java.io.InputStream getResource(java.lang.String suffix) throws java.io.IOException
- Throws:
java.io.IOException
-
getResource
public static final java.io.InputStream getResource(BinaryDictionary.ResourceScheme scheme, java.lang.String path) throws java.io.IOException
- Throws:
java.io.IOException
-
getClassResource
public static final java.io.InputStream getClassResource(java.lang.Class<?> clazz, java.lang.String suffix) throws java.io.IOException
- Throws:
java.io.IOException
-
getClassResource
private static java.io.InputStream getClassResource(java.lang.String path) throws java.io.IOException
- Throws:
java.io.IOException
-
lookupWordIds
public void lookupWordIds(int sourceId, IntsRef ref)
-
getLeftId
public int getLeftId(int wordId)
Description copied from interface:Dictionary
Get left id of specified word- Specified by:
getLeftId
in interfaceDictionary
- Returns:
- left id
-
getRightId
public int getRightId(int wordId)
Description copied from interface:Dictionary
Get right id of specified word- Specified by:
getRightId
in interfaceDictionary
- Returns:
- right id
-
getWordCost
public int getWordCost(int wordId)
Description copied from interface:Dictionary
Get word cost of specified word- Specified by:
getWordCost
in interfaceDictionary
- Returns:
- word's cost
-
getBaseForm
public java.lang.String getBaseForm(int wordId, char[] surfaceForm, int off, int len)
Description copied from interface:Dictionary
Get base form of word- Specified by:
getBaseForm
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Base form (only different for inflected words, otherwise null)
-
getReading
public java.lang.String getReading(int wordId, char[] surface, int off, int len)
Description copied from interface:Dictionary
Get reading of tokens- Specified by:
getReading
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Reading of the token
-
getPartOfSpeech
public java.lang.String getPartOfSpeech(int wordId)
Description copied from interface:Dictionary
Get Part-Of-Speech of tokens- Specified by:
getPartOfSpeech
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Part-Of-Speech of the token
-
getPronunciation
public java.lang.String getPronunciation(int wordId, char[] surface, int off, int len)
Description copied from interface:Dictionary
Get pronunciation of tokens- Specified by:
getPronunciation
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Pronunciation of the token
-
getInflectionType
public java.lang.String getInflectionType(int wordId)
Description copied from interface:Dictionary
Get inflection type of tokens- Specified by:
getInflectionType
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- inflection type, or null
-
getInflectionForm
public java.lang.String getInflectionForm(int wordId)
Description copied from interface:Dictionary
Get inflection form of tokens- Specified by:
getInflectionForm
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- inflection form, or null
-
baseFormOffset
private static int baseFormOffset(int wordId)
-
readingOffset
private int readingOffset(int wordId)
-
pronunciationOffset
private int pronunciationOffset(int wordId)
-
hasBaseFormData
private boolean hasBaseFormData(int wordId)
-
hasReadingData
private boolean hasReadingData(int wordId)
-
hasPronunciationData
private boolean hasPronunciationData(int wordId)
-
readString
private java.lang.String readString(int offset, int length, boolean kana)
-
-