org.cyberneko.html

Class HTMLConfiguration

public class HTMLConfiguration extends ParserConfigurationSettings implements XMLPullParserConfiguration

An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.

This configuration recognizes the following features:

This configuration recognizes the following properties:

For complete usage information, refer to the documentation.

Version: $Id: HTMLConfiguration.java,v 1.9 2005/02/14 03:56:54 andyc Exp $

Author: Andy Clark

See Also: HTMLScanner HTMLTagBalancer

Nested Class Summary
protected classHTMLConfiguration.ErrorReporter
Defines an error reporter for reporting HTML errors.
Field Summary
protected static StringAUGMENTATIONS
Include infoset augmentations.
protected static StringBALANCE_TAGS
Balance tags.
protected static StringERROR_DOMAIN
Error domain.
protected static StringERROR_REPORTER
Error reporter.
protected booleanfCloseStream
Stream opened by parser.
protected XMLDocumentHandlerfDocumentHandler
Document handler.
protected HTMLScannerfDocumentScanner
Document scanner.
protected XMLDTDContentModelHandlerfDTDContentModelHandler
DTD content model handler.
protected XMLDTDHandlerfDTDHandler
DTD handler.
protected XMLEntityResolverfEntityResolver
Entity resolver.
protected XMLErrorHandlerfErrorHandler
Error handler.
protected HTMLErrorReporterfErrorReporter
Error reporter.
protected VectorfHTMLComponents
Components.
protected LocalefLocale
Locale.
protected NamespaceBinderfNamespaceBinder
Namespace binder.
protected HTMLTagBalancerfTagBalancer
HTML tag balancer.
protected static StringFILTERS
Pipeline filters.
protected static StringNAMESPACES
Namespaces.
protected static StringNAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.
protected static StringNAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.
protected static StringREPORT_ERRORS
Report errors.
protected static StringSIMPLE_ERROR_FORMAT
Simple report format.
protected static booleanXERCES_2_0_0
Parser version is Xerces 2.0.0.
protected static booleanXERCES_2_0_1
Parser version is Xerces 2.0.1.
protected static booleanXML4J_4_0_x
Parser version is XML4J 4.0.x.
Constructor Summary
HTMLConfiguration()
Default constructor.
Method Summary
protected voidaddComponent(HTMLComponent component)
Adds a component.
voidcleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing.
XMLDocumentHandlergetDocumentHandler()
Returns the document handler.
XMLDTDContentModelHandlergetDTDContentModelHandler()
Returns the DTD content model handler.
XMLDTDHandlergetDTDHandler()
Returns the DTD handler.
XMLEntityResolvergetEntityResolver()
Returns the entity resolver.
XMLErrorHandlergetErrorHandler()
Returns the error handler.
LocalegetLocale()
Returns the locale.
voidparse(XMLInputSource source)
Parses a document.
booleanparse(boolean complete)
Parses the document in a pull parsing fashion.
voidpushInputSource(XMLInputSource inputSource)
Pushes an input source onto the current entity stack.
protected voidreset()
Resets the parser configuration.
voidsetDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.
voidsetDTDContentModelHandler(XMLDTDContentModelHandler handler)
Sets the DTD content model handler.
voidsetDTDHandler(XMLDTDHandler handler)
Sets the DTD handler.
voidsetEntityResolver(XMLEntityResolver resolver)
Sets the entity resolver.
voidsetErrorHandler(XMLErrorHandler handler)
Sets the error handler.
voidsetFeature(String featureId, boolean state)
Sets a feature.
voidsetInputSource(XMLInputSource inputSource)
Sets the input source for the document to parse.
voidsetLocale(Locale locale)
Sets the locale.
voidsetProperty(String propertyId, Object value)
Sets a property.

Field Detail

AUGMENTATIONS

protected static final String AUGMENTATIONS
Include infoset augmentations.

BALANCE_TAGS

protected static final String BALANCE_TAGS
Balance tags.

ERROR_DOMAIN

protected static final String ERROR_DOMAIN
Error domain.

ERROR_REPORTER

protected static final String ERROR_REPORTER
Error reporter.

fCloseStream

protected boolean fCloseStream
Stream opened by parser. Therefore, must close stream manually upon termination of parsing.

fDocumentHandler

protected XMLDocumentHandler fDocumentHandler
Document handler.

fDocumentScanner

protected HTMLScanner fDocumentScanner
Document scanner.

fDTDContentModelHandler

protected XMLDTDContentModelHandler fDTDContentModelHandler
DTD content model handler.

fDTDHandler

protected XMLDTDHandler fDTDHandler
DTD handler.

fEntityResolver

protected XMLEntityResolver fEntityResolver
Entity resolver.

fErrorHandler

protected XMLErrorHandler fErrorHandler
Error handler.

fErrorReporter

protected HTMLErrorReporter fErrorReporter
Error reporter.

fHTMLComponents

protected Vector fHTMLComponents
Components.

fLocale

protected Locale fLocale
Locale.

fNamespaceBinder

protected NamespaceBinder fNamespaceBinder
Namespace binder.

fTagBalancer

protected HTMLTagBalancer fTagBalancer
HTML tag balancer.

FILTERS

protected static final String FILTERS
Pipeline filters.

NAMESPACES

protected static final String NAMESPACES
Namespaces.

NAMES_ATTRS

protected static final String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.

NAMES_ELEMS

protected static final String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.

REPORT_ERRORS

protected static final String REPORT_ERRORS
Report errors.

SIMPLE_ERROR_FORMAT

protected static final String SIMPLE_ERROR_FORMAT
Simple report format.

XERCES_2_0_0

protected static boolean XERCES_2_0_0
Parser version is Xerces 2.0.0.

XERCES_2_0_1

protected static boolean XERCES_2_0_1
Parser version is Xerces 2.0.1.

XML4J_4_0_x

protected static boolean XML4J_4_0_x
Parser version is XML4J 4.0.x.

Constructor Detail

HTMLConfiguration

public HTMLConfiguration()
Default constructor.

Method Detail

addComponent

protected void addComponent(HTMLComponent component)
Adds a component.

cleanup

public void cleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.

getDocumentHandler

public XMLDocumentHandler getDocumentHandler()
Returns the document handler.

getDTDContentModelHandler

public XMLDTDContentModelHandler getDTDContentModelHandler()
Returns the DTD content model handler.

getDTDHandler

public XMLDTDHandler getDTDHandler()
Returns the DTD handler.

getEntityResolver

public XMLEntityResolver getEntityResolver()
Returns the entity resolver.

getErrorHandler

public XMLErrorHandler getErrorHandler()
Returns the error handler.

getLocale

public Locale getLocale()
Returns the locale.

parse

public void parse(XMLInputSource source)
Parses a document.

parse

public boolean parse(boolean complete)
Parses the document in a pull parsing fashion.

Parameters: complete True if the pull parser should parse the remaining document completely.

Returns: True if there is more document to parse.

Throws: XNIException Any XNI exception, possibly wrapping another exception. IOException An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.

See Also: HTMLConfiguration

pushInputSource

public void pushInputSource(XMLInputSource inputSource)
Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.

Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.

Parameters: inputSource The new input source to start scanning.

reset

protected void reset()
Resets the parser configuration.

setDocumentHandler

public void setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler.

setDTDContentModelHandler

public void setDTDContentModelHandler(XMLDTDContentModelHandler handler)
Sets the DTD content model handler.

setDTDHandler

public void setDTDHandler(XMLDTDHandler handler)
Sets the DTD handler.

setEntityResolver

public void setEntityResolver(XMLEntityResolver resolver)
Sets the entity resolver.

setErrorHandler

public void setErrorHandler(XMLErrorHandler handler)
Sets the error handler.

setFeature

public void setFeature(String featureId, boolean state)
Sets a feature.

setInputSource

public void setInputSource(XMLInputSource inputSource)
Sets the input source for the document to parse.

Parameters: inputSource The document's input source.

Throws: XMLConfigurationException Thrown if there is a configuration error when initializing the parser. IOException Thrown on I/O error.

See Also: HTMLConfiguration

setLocale

public void setLocale(Locale locale)
Sets the locale.

setProperty

public void setProperty(String propertyId, Object value)
Sets a property.
(C) Copyright 2002-2005, Andy Clark. All rights reserved.