Package org.htmlparser.scanners
The scanners package contains classes responsible for the tertiary
identification of tags.
See:
Description
Interface Summary |
Scanner |
Generic interface for scanning.
|
The scanners package contains classes responsible for the tertiary
identification of tags. The lower level classes in the
lexer
package convert
byte streams to characters and characters to nodes (via the
NodeFactory
). In the case of tags, the
scanners in this package can then complete the tag or override the current tag
and return an augmented tag. The existing implementation of the
composite tag
scanner
, for example, gathers the children of composite tags, identifying the
nested structure of HTML documents. The
script scanner
overrides the nodes
returned by the lexer and creates a tag containing a single string that is the
script code.
You might need to create a scanner (that implements the
Scanner
interface) if
the text you are trying to parse doesn't look like HTML, as is the case for the
script scanner, or the normal processing of tags by nesting their structure is
inadequate.
HTML Parser is an open source library released under LGPL. |  |