public class HtmlCleaner extends Object
It represents public interface to the user. It's task is to call tokenizer with specified source HTML, traverse list of produced token list and create internal object model. It also offers a set of methods to write resulting XML to string, file or any output stream.
Typical usage is the following:
Modifier and Type | Field and Description |
---|---|
static String |
DEFAULT_CHARSET |
Constructor and Description |
---|
HtmlCleaner()
Constructor - creates cleaner instance with default tag info provider and default properties.
|
HtmlCleaner(CleanerProperties properties)
Constructor - creates the instance with default tag info provider and specified properties
|
HtmlCleaner(ITagInfoProvider tagInfoProvider)
Constructor - creates the instance with specified tag info provider and default properties
|
HtmlCleaner(ITagInfoProvider tagInfoProvider,
CleanerProperties properties)
Constructor - creates the instance with specified tag info provider and specified properties
|
Modifier and Type | Method and Description |
---|---|
TagNode |
clean(File file) |
TagNode |
clean(File file,
String charset) |
TagNode |
clean(InputStream in) |
TagNode |
clean(InputStream in,
String charset) |
TagNode |
clean(Reader reader) |
TagNode |
clean(Reader reader,
org.htmlcleaner.HtmlCleaner.CleanTimeValues cleanTimeValues)
Basic version of the cleaning call.
|
TagNode |
clean(String htmlContent) |
TagNode |
clean(URL url)
Creates instance from the content downloaded from specified URL.
|
TagNode |
clean(URL url,
String charset) |
String |
getInnerHtml(TagNode node)
For the specified node, returns it's content as string.
|
CleanerProperties |
getProperties() |
ITagInfoProvider |
getTagInfoProvider() |
CleanerTransformations |
getTransformations() |
void |
setInnerHtml(TagNode node,
String content)
For the specified tag node, defines it's html content.
|
void |
setTransformations(CleanerTransformations transformations)
Sets tranformations for this cleaner instance.
|
public static final String DEFAULT_CHARSET
public HtmlCleaner()
public HtmlCleaner(ITagInfoProvider tagInfoProvider)
tagInfoProvider
- Provider for tag filtering and balancingpublic HtmlCleaner(CleanerProperties properties)
properties
- Properties used during parsing and serializingpublic HtmlCleaner(ITagInfoProvider tagInfoProvider, CleanerProperties properties)
tagInfoProvider
- Provider for tag filtering and balancingproperties
- Properties used during parsing and serializingpublic TagNode clean(File file, String charset) throws IOException
IOException
public TagNode clean(File file) throws IOException
IOException
public TagNode clean(URL url, String charset) throws IOException
IOException
public TagNode clean(URL url) throws IOException
url
- IOException
public TagNode clean(InputStream in, String charset) throws IOException
IOException
public TagNode clean(InputStream in) throws IOException
IOException
public TagNode clean(Reader reader) throws IOException
IOException
public TagNode clean(Reader reader, org.htmlcleaner.HtmlCleaner.CleanTimeValues cleanTimeValues) throws IOException
reader
- IOException
public CleanerProperties getProperties()
public ITagInfoProvider getTagInfoProvider()
public CleanerTransformations getTransformations()
public void setTransformations(CleanerTransformations transformations)
transformations
- public String getInnerHtml(TagNode node)
node
- Copyright © 2006–2013. All rights reserved.