org.htmlparser.util
public class Translate extends Object
Typical usage:
String s = Translate.decode (getTextFromHtmlPage ());or
String s = "<HTML>" + Translate.encode (getArbitraryText ()) + "</HTML>";
Field Summary | |
---|---|
protected static int | BREAKPOINT
The dividing point between a simple table lookup and a binary search.
|
static boolean | DECODE_LINE_BY_LINE
If this member is set true , decoding of streams is
done line by line in order to reduce the maximum memory required. |
static boolean | ENCODE_HEXADECIMAL
If this member is set true , encoding of numeric character
references uses hexadecimal digits, i.e. |
protected static CharacterReference[] | mCharacterList
List of references sorted by character.
|
protected static CharacterReference[] | mCharacterReferences
Table mapping entity reference kernel to character.
|
Method Summary | |
---|---|
static String | decode(String string)
Decode a string containing references.
|
static String | decode(StringBuffer buffer)
Decode the characters in a string buffer containing references.
|
static void | decode(InputStream in, PrintStream out)
Decode a stream containing references.
|
static String | encode(int character)
Convert a character to a numeric character reference.
|
static String | encode(String string)
Encode a string to use references.
|
static void | encode(InputStream in, PrintStream out)
Encode a stream to use references.
|
protected static int | lookup(CharacterReference[] array, char ref, int lo, int hi)
Binary search for a reference. |
static CharacterReference | lookup(char character)
Look up a reference by character.
|
protected static CharacterReference | lookup(CharacterReference key)
Look up a reference by kernel.
|
static CharacterReference | lookup(String kernel, int start, int end)
Look up a reference by kernel.
|
static void | main(String[] args)
Numeric character reference and character entity reference to unicode codec.
|
true
, decoding of streams is
done line by line in order to reduce the maximum memory required.true
, encoding of numeric character
references uses hexadecimal digits, i.e. ○, instead of decimal
digits.BREAKPOINT
is stored
in a direct translational table, indexing into the table with a character
yields the reference. The second part is dense and sorted by character,
suitable for binary lookup.Parameters: string The string to translate.
Parameters: buffer The StringBuffer containing references.
Returns: The decoded string.
DECODE_LINE_BY_LINE
is true,
the input stream is broken up into lines, terminated by either
carriage return or newline, in order to reduce the latency and maximum
buffering memory size required.Parameters: in The stream to translate. It is assumed that the input stream is encoded with ISO-8859-1 since the table of character entity references in this class applies only to ISO-8859-1. out The stream to write the decoded stream to.
Parameters: character The character to convert.
Returns: The converted character.
Parameters: string The string to translate.
Returns: The encoded string.
Parameters: in The stream to translate. It is assumed that the input stream is encoded with ISO-8859-1 since the table of character entity references in this class applies only to ISO-8859-1. out The stream to write the decoded stream to.
Parameters: array The array of CharacterReference
objects. ref The character to search for. lo The lower index within which to look. hi The upper index within which to look.
Returns: The index at which reference was found or is to be inserted.
Parameters: character The character to be looked up.
Returns: The entity reference for that character or null
.
Parameters: key A character reference with the kernel set to the string to be found. It need not be truncated at the exact end of the reference.
lookup(CharacterReference)
instead.Parameters: kernel The string to lookup, i.e. "amp". start The starting point in the string of the kernel. end The ending point in the string of the kernel. This should be the index of the semicolon if it exists, or failing that, at least an index past the last character of the kernel.
Returns: The reference that matches the given string, or null
if it wasn't found.
System.in
input into an encoded or decoded
stream and send the results to System.out
.Parameters: args If arg[0] is -encode
perform an encoding on
System.in
, otherwise perform a decoding.
HTML Parser is an open source library released under LGPL. | |