org.cyberneko.html.filters

Class ElementRemover

public class ElementRemover extends DefaultFilter

This class is a document filter capable of removing specified elements from the processing stream. There are two options for processing document elements:

The first option allows the application to specify which elements appearing in the event stream should be accepted and, therefore, passed on to the next stage in the pipeline. All elements not in the list of acceptable elements have their start and end tags stripped from the event stream unless those elements appear in the list of elements to be removed.

The second option allows the application to specify which elements should be completely removed from the event stream. When an element appears that is to be removed, the element's start and end tag as well as all of that element's content is removed from the event stream.

A common use of this filter would be to only allow rich-text and linking elements as well as the character content to pass through the filter — all other elements would be stripped. The following code shows how to configure this filter to perform this task:

  ElementRemover remover = new ElementRemover();
  remover.acceptElement("b", null);
  remover.acceptElement("i", null);
  remover.acceptElement("u", null);
  remover.acceptElement("a", new String[] { "href" });
 

However, this would still allow the text content of other elements to pass through, which may not be desirable. In order to further "clean" the input, the removeElement option can be used. The following piece of code adds the ability to completely remove any <SCRIPT> tags and content from the stream.

  remover.removeElement("script");
 

Note: All text and accepted element children of a stripped element is retained. To completely remove an element's content, use the removeElement method.

Note: Care should be taken when using this filter because the output may not be a well-balanced tree. Specifically, if the application removes the <HTML> element (with or without retaining its children), the resulting document event stream will no longer be well-formed.

Version: $Id: ElementRemover.java,v 1.5 2005/02/14 03:56:54 andyc Exp $

Author: Andy Clark

Field Summary
protected HashtablefAcceptedElements
Accepted elements.
protected intfElementDepth
The element depth.
protected intfRemovalElementDepth
The element depth at element removal.
protected HashtablefRemovedElements
Removed elements.
protected static ObjectNULL
A "null" object.
Method Summary
voidacceptElement(String element, String[] attributes)
Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.
voidcharacters(XMLString text, Augmentations augs)
Characters.
voidcomment(XMLString text, Augmentations augs)
Comment.
protected booleanelementAccepted(String element)
Returns true if the specified element is accepted.
protected booleanelementRemoved(String element)
Returns true if the specified element should be removed.
voidemptyElement(QName element, XMLAttributes attributes, Augmentations augs)
Empty element.
voidendCDATA(Augmentations augs)
End CDATA section.
voidendElement(QName element, Augmentations augs)
End element.
voidendGeneralEntity(String name, Augmentations augs)
End general entity.
voidendPrefixMapping(String prefix, Augmentations augs)
End prefix mapping.
protected booleanhandleOpenTag(QName element, XMLAttributes attributes)
Handles an open tag.
voidignorableWhitespace(XMLString text, Augmentations augs)
Ignorable whitespace.
voidprocessingInstruction(String target, XMLString data, Augmentations augs)
Processing instruction.
voidremoveElement(String element)
Specifies that the given element should be completely removed.
voidstartCDATA(Augmentations augs)
Start CDATA section.
voidstartDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)
Start document.
voidstartDocument(XMLLocator locator, String encoding, Augmentations augs)
Start document.
voidstartElement(QName element, XMLAttributes attributes, Augmentations augs)
Start element.
voidstartGeneralEntity(String name, XMLResourceIdentifier id, String encoding, Augmentations augs)
Start general entity.
voidstartPrefixMapping(String prefix, String uri, Augmentations augs)
Start prefix mapping.
voidtextDecl(String version, String encoding, Augmentations augs)
Text declaration.

Field Detail

fAcceptedElements

protected Hashtable fAcceptedElements
Accepted elements.

fElementDepth

protected int fElementDepth
The element depth.

fRemovalElementDepth

protected int fRemovalElementDepth
The element depth at element removal.

fRemovedElements

protected Hashtable fRemovedElements
Removed elements.

NULL

protected static final Object NULL
A "null" object.

Method Detail

acceptElement

public void acceptElement(String element, String[] attributes)
Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.

Parameters: element The element to accept. attributes The list of attributes to be kept or null if no attributes should be kept for this element. see #removeElement

characters

public void characters(XMLString text, Augmentations augs)
Characters.

comment

public void comment(XMLString text, Augmentations augs)
Comment.

elementAccepted

protected boolean elementAccepted(String element)
Returns true if the specified element is accepted.

elementRemoved

protected boolean elementRemoved(String element)
Returns true if the specified element should be removed.

emptyElement

public void emptyElement(QName element, XMLAttributes attributes, Augmentations augs)
Empty element.

endCDATA

public void endCDATA(Augmentations augs)
End CDATA section.

endElement

public void endElement(QName element, Augmentations augs)
End element.

endGeneralEntity

public void endGeneralEntity(String name, Augmentations augs)
End general entity.

endPrefixMapping

public void endPrefixMapping(String prefix, Augmentations augs)
End prefix mapping.

handleOpenTag

protected boolean handleOpenTag(QName element, XMLAttributes attributes)
Handles an open tag.

ignorableWhitespace

public void ignorableWhitespace(XMLString text, Augmentations augs)
Ignorable whitespace.

processingInstruction

public void processingInstruction(String target, XMLString data, Augmentations augs)
Processing instruction.

removeElement

public void removeElement(String element)
Specifies that the given element should be completely removed. If an element is encountered during processing that is on the remove list, the element's start and end tags as well as all of content contained within the element will be removed from the processing stream.

Parameters: element The element to completely remove.

startCDATA

public void startCDATA(Augmentations augs)
Start CDATA section.

startDocument

public void startDocument(XMLLocator locator, String encoding, NamespaceContext nscontext, Augmentations augs)
Start document.

startDocument

public void startDocument(XMLLocator locator, String encoding, Augmentations augs)
Start document.

startElement

public void startElement(QName element, XMLAttributes attributes, Augmentations augs)
Start element.

startGeneralEntity

public void startGeneralEntity(String name, XMLResourceIdentifier id, String encoding, Augmentations augs)
Start general entity.

startPrefixMapping

public void startPrefixMapping(String prefix, String uri, Augmentations augs)
Start prefix mapping.

textDecl

public void textDecl(String version, String encoding, Augmentations augs)
Text declaration.
(C) Copyright 2002-2005, Andy Clark. All rights reserved.