org.apache.lucene.analysis.ru
Class RussianLetterTokenizer
public
class
RussianLetterTokenizer
extends CharTokenizer
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters
in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method,
which doesn't know how to detect letters in encodings like CP1252 and KOI8
(well-known problems with 0xD7 and 0xF7 chars)
Version: $Id: RussianLetterTokenizer.java,v 1.3 2004/03/29 22:48:01 cutting Exp $
Author: Boris Okner, b.okner@rogers.com
public RussianLetterTokenizer(Reader in, char[] charset)
protected boolean isTokenChar(char c)
Collects only characters which satisfy
{@link Character#isLetter(char)}.
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.