com.swabunga.spell.event

Interface WordTokenizer

public interface WordTokenizer

An interface for objects which take a text-based media as input, and iterate through the words in the text stored in that media. Examples of such media could be Strings, Documents, Files, TextComponents etc.

When the object is instantiated, and before the first call to next() is made, the following methods should throw a WordNotFoundException:
getCurrentWordEnd(), getCurrentWordPosition(), isNewSentence() and replaceWord().

A call to next() when hasMoreWords() returns false should throw a WordNotFoundException.

Author: Jason Height (jheight@chariot.net.au)

Method Summary
StringgetContext()
Returns the context text that is being tokenized (should include any changes that have been made).
intgetCurrentWordCount()
Returns the number of word tokens that have been processed thus far
intgetCurrentWordEnd()
Returns an index representing the end location of the current word in the text.
intgetCurrentWordPosition()
Returns an index representing the start location of the current word in the text.
booleanhasMoreWords()
Indicates if there are more words left
booleanisNewSentence()
Returns true if the current word is at the start of a sentence
StringnextWord()
This returns the next word in the iteration.
voidreplaceWord(String newWord)
Replaces the current word token

When a word is replaced care should be taken that the WordTokenizer repositions itself such that the words that were added aren't rechecked.

Method Detail

getContext

public String getContext()
Returns the context text that is being tokenized (should include any changes that have been made).

Returns: the text being searched.

getCurrentWordCount

public int getCurrentWordCount()
Returns the number of word tokens that have been processed thus far

Returns: the number of words found so far.

getCurrentWordEnd

public int getCurrentWordEnd()
Returns an index representing the end location of the current word in the text.

Returns: index of the end of the current word in the text.

Throws: WordNotFoundException current word has not yet been set.

getCurrentWordPosition

public int getCurrentWordPosition()
Returns an index representing the start location of the current word in the text.

Returns: index of the start of the current word in the text.

Throws: WordNotFoundException current word has not yet been set.

hasMoreWords

public boolean hasMoreWords()
Indicates if there are more words left

Returns: true if more words can be found in the text.

isNewSentence

public boolean isNewSentence()
Returns true if the current word is at the start of a sentence

Returns: true if the current word starts a sentence.

Throws: WordNotFoundException current word has not yet been set.

nextWord

public String nextWord()
This returns the next word in the iteration. Note that any implementation should return the current word, and then replace the current word with the next word found in the input text (if one exists).

Returns: the next word in the iteration.

Throws: WordNotFoundException search string contains no more words.

replaceWord

public void replaceWord(String newWord)
Replaces the current word token

When a word is replaced care should be taken that the WordTokenizer repositions itself such that the words that were added aren't rechecked. Of course this is not mandatory, maybe there is a case when an application doesn't need to do this.

Parameters: newWord the string which should replace the current word.

Throws: WordNotFoundException current word has not yet been set.