org.apache.xerces.impl

Class XMLScanner

Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponent
Known Direct Subclasses:
XMLDocumentFragmentScannerImpl, XMLDTDScannerImpl

public abstract class XMLScanner
extends java.lang.Object
implements org.apache.xerces.xni.parser.XMLComponent

This class is responsible for holding scanning methods common to scanning the XML document structure and content as well as the DTD structure and content. Both XMLDocumentScanner and XMLDTDScanner inherit from this base class.

This component requires the following features and properties from the component manager that uses it:

Version:
$Id: XMLScanner.java,v 1.51 2004/10/04 21:45:48 mrglavas Exp $
Authors:
Andy Clark, IBM
Arnaud Le Hors, IBM
Eric Ye, IBM

Field Summary

protected static boolean
DEBUG_ATTR_NORMALIZATION
Debug attribute normalization.
protected static String
ENTITY_MANAGER
Property identifier: entity manager.
protected static String
ERROR_REPORTER
Property identifier: error reporter.
protected static String
NAMESPACES
Feature identifier: namespaces.
protected static String
NOTIFY_CHAR_REFS
Feature identifier: notify character references.
protected static String
PARSER_SETTINGS
protected static String
SYMBOL_TABLE
Property identifier: symbol table.
protected static String
VALIDATION
Feature identifier: validation.
protected static String
fAmpSymbol
Symbol: "amp".
protected static String
fAposSymbol
Symbol: "apos".
protected String
fCharRefLiteral
Literal value of the last character refence scanned.
protected static String
fEncodingSymbol
Symbol: "encoding".
protected int
fEntityDepth
Entity depth.
protected XMLEntityManager
fEntityManager
Entity manager.
protected XMLEntityScanner
fEntityScanner
Entity scanner.
protected XMLErrorReporter
fErrorReporter
Error reporter.
protected static String
fGtSymbol
Symbol: "gt".
protected static String
fLtSymbol
Symbol: "lt".
protected boolean
fNamespaces
Namespaces.
protected boolean
fNotifyCharRefs
Character references notification.
protected boolean
fParserSettings
Internal parser-settings feature
protected static String
fQuotSymbol
Symbol: "quot".
protected boolean
fReportEntity
Report entity boundary.
protected XMLResourceIdentifierImpl
fResourceIdentifier
protected boolean
fScanningAttribute
Scanning attribute.
protected static String
fStandaloneSymbol
Symbol: "standalone".
protected SymbolTable
fSymbolTable
Symbol table.
protected boolean
fValidation
Validation.
protected static String
fVersionSymbol
Symbol: "version".

Method Summary

void
endEntity(String name, org.apache.xerces.xni.Augmentations augs)
This method notifies the end of an entity.
boolean
getFeature(String featureId)
protected String
getVersionNotSupportedKey()
protected boolean
isInvalid(int value)
protected boolean
isInvalidLiteral(int value)
protected int
isUnchangedByNormalization(org.apache.xerces.xni.XMLString value)
Checks whether this string would be unchanged by normalization.
protected boolean
isValidNCName(int value)
protected boolean
isValidNameChar(int value)
protected boolean
isValidNameStartChar(int value)
protected boolean
isValidNameStartHighSurrogate(int value)
protected void
normalizeWhitespace(org.apache.xerces.xni.XMLString value)
Normalize whitespace in an XMLString converting all whitespace characters to space characters.
protected void
normalizeWhitespace(org.apache.xerces.xni.XMLString value, int fromIndex)
Normalize whitespace in an XMLString converting all whitespace characters to space characters.
protected void
reportFatalError(String msgId, Object[] args)
Convenience function used in all XML scanners.
protected void
reset()
void
reset(org.apache.xerces.xni.parser.XMLComponentManager componentManager)
protected boolean
scanAttributeValue(org.apache.xerces.xni.XMLString value, org.apache.xerces.xni.XMLString nonNormalizedValue, String atName, boolean checkEntities, String eleName)
Scans an attribute value and normalizes whitespace converting all whitespace characters to space characters.
protected int
scanCharReferenceValue(XMLStringBuffer buf, XMLStringBuffer buf2)
Scans a character reference and append the corresponding chars to the specified buffer.
protected void
scanComment(XMLStringBuffer text)
Scans a comment.
protected void
scanExternalID(String[] identifiers, boolean optionalSystemId)
Scans External ID and return the public and system IDs.
protected void
scanPI()
Scans a processing instruction.
protected void
scanPIData(String target, org.apache.xerces.xni.XMLString data)
Scans a processing data.
String
scanPseudoAttribute(boolean scanningTextDecl, org.apache.xerces.xni.XMLString value)
Scans a pseudo attribute.
protected boolean
scanPubidLiteral(org.apache.xerces.xni.XMLString literal)
Scans public ID literal.
protected boolean
scanSurrogates(XMLStringBuffer buf)
Scans surrogates and append them to the specified buffer.
protected void
scanXMLDeclOrTextDecl(boolean scanningTextDecl, String[] pseudoAttributeValues)
Scans an XML or text declaration.
void
setFeature(String featureId, boolean value)
void
setProperty(String propertyId, Object value)
Sets the value of a property during parsing.
void
startEntity(String name, org.apache.xerces.xni.XMLResourceIdentifier identifier, String encoding, org.apache.xerces.xni.Augmentations augs)
This method notifies of the start of an entity.
protected boolean
versionSupported(String version)

Field Details

DEBUG_ATTR_NORMALIZATION

protected static final boolean DEBUG_ATTR_NORMALIZATION
Debug attribute normalization.
Field Value:
false

ENTITY_MANAGER

protected static final String ENTITY_MANAGER
Property identifier: entity manager.

ERROR_REPORTER

protected static final String ERROR_REPORTER
Property identifier: error reporter.

NAMESPACES

protected static final String NAMESPACES
Feature identifier: namespaces.

NOTIFY_CHAR_REFS

protected static final String NOTIFY_CHAR_REFS
Feature identifier: notify character references.

PARSER_SETTINGS

protected static final String PARSER_SETTINGS

SYMBOL_TABLE

protected static final String SYMBOL_TABLE
Property identifier: symbol table.

VALIDATION

protected static final String VALIDATION
Feature identifier: validation.

fAmpSymbol

protected static final String fAmpSymbol
Symbol: "amp".

fAposSymbol

protected static final String fAposSymbol
Symbol: "apos".

fCharRefLiteral

protected String fCharRefLiteral
Literal value of the last character refence scanned.

fEncodingSymbol

protected static final String fEncodingSymbol
Symbol: "encoding".

fEntityDepth

protected int fEntityDepth
Entity depth.

fEntityManager

protected XMLEntityManager fEntityManager
Entity manager.

fEntityScanner

protected XMLEntityScanner fEntityScanner
Entity scanner.

fErrorReporter

protected XMLErrorReporter fErrorReporter
Error reporter.

fGtSymbol

protected static final String fGtSymbol
Symbol: "gt".

fLtSymbol

protected static final String fLtSymbol
Symbol: "lt".

fNamespaces

protected boolean fNamespaces
Namespaces.

fNotifyCharRefs

protected boolean fNotifyCharRefs
Character references notification.

fParserSettings

protected boolean fParserSettings
Internal parser-settings feature

fQuotSymbol

protected static final String fQuotSymbol
Symbol: "quot".

fReportEntity

protected boolean fReportEntity
Report entity boundary.

fResourceIdentifier

protected XMLResourceIdentifierImpl fResourceIdentifier

fScanningAttribute

protected boolean fScanningAttribute
Scanning attribute.

fStandaloneSymbol

protected static final String fStandaloneSymbol
Symbol: "standalone".

fSymbolTable

protected SymbolTable fSymbolTable
Symbol table.

fValidation

protected boolean fValidation
Validation. This feature identifier is: http://xml.org/sax/features/validation

fVersionSymbol

protected static final String fVersionSymbol
Symbol: "version".

Method Details

endEntity

public void endEntity(String name,
                      org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
This method notifies the end of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.
Parameters:
name - The name of the entity.
augs - Additional information that may include infoset augmentations
Throws:
org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.

getFeature

public boolean getFeature(String featureId)
            throws org.apache.xerces.xni.parser.XMLConfigurationException

getVersionNotSupportedKey

protected String getVersionNotSupportedKey()

isInvalid

protected boolean isInvalid(int value)

isInvalidLiteral

protected boolean isInvalidLiteral(int value)

isUnchangedByNormalization

protected int isUnchangedByNormalization(org.apache.xerces.xni.XMLString value)
Checks whether this string would be unchanged by normalization.
Returns:
-1 if the value would be unchanged by normalization, otherwise the index of the first whitespace character which would be transformed.

isValidNCName

protected boolean isValidNCName(int value)

isValidNameChar

protected boolean isValidNameChar(int value)

isValidNameStartChar

protected boolean isValidNameStartChar(int value)

isValidNameStartHighSurrogate

protected boolean isValidNameStartHighSurrogate(int value)

normalizeWhitespace

protected void normalizeWhitespace(org.apache.xerces.xni.XMLString value)
Normalize whitespace in an XMLString converting all whitespace characters to space characters.

normalizeWhitespace

protected void normalizeWhitespace(org.apache.xerces.xni.XMLString value,
                                   int fromIndex)
Normalize whitespace in an XMLString converting all whitespace characters to space characters.

reportFatalError

protected void reportFatalError(String msgId,
                                Object[] args)
            throws org.apache.xerces.xni.XNIException
Convenience function used in all XML scanners.

reset

protected void reset()

reset

public void reset(org.apache.xerces.xni.parser.XMLComponentManager componentManager)
            throws org.apache.xerces.xni.parser.XMLConfigurationException
Specified by:
reset in interface org.apache.xerces.xni.parser.XMLComponent
Parameters:
componentManager - The component manager.

scanAttributeValue

protected boolean scanAttributeValue(org.apache.xerces.xni.XMLString value,
                                     org.apache.xerces.xni.XMLString nonNormalizedValue,
                                     String atName,
                                     boolean checkEntities,
                                     String eleName)
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans an attribute value and normalizes whitespace converting all whitespace characters to space characters. [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
Parameters:
value - The XMLString to fill in with the value.
nonNormalizedValue - The XMLString to fill in with the non-normalized value.
atName - The name of the attribute being parsed (for error msgs).
checkEntities - true if undeclared entities should be reported as VC violation, false if undeclared entities should be reported as WFC violation.
eleName - The name of element to which this attribute belongs.
Returns:
true if the non-normalized and normalized value are the same Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.

scanCharReferenceValue

protected int scanCharReferenceValue(XMLStringBuffer buf,
                                     XMLStringBuffer buf2)
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans a character reference and append the corresponding chars to the specified buffer.

 [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
 
Note: This method uses fStringBuffer, anything in it at the time of calling is lost.
Parameters:
buf - the character buffer to append chars to
buf2 - the character buffer to append non-normalized chars to
Returns:
the character value or (-1) on conversion failure

scanComment

protected void scanComment(XMLStringBuffer text)
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans a comment.

 [15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
 

Note: Called after scanning past '<!--' Note: This method uses fString, anything in it at the time of calling is lost.

Parameters:
text - The buffer to fill in with the text.

scanExternalID

protected void scanExternalID(String[] identifiers,
                              boolean optionalSystemId)
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans External ID and return the public and system IDs.
Parameters:
identifiers - An array of size 2 to return the system id, and public id (in that order).
optionalSystemId - Specifies whether the system id is optional. Note: This method uses fString and fStringBuffer, anything in them at the time of calling is lost.

scanPI

protected void scanPI()
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans a processing instruction.

 [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
 [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
 
Note: This method uses fString, anything in it at the time of calling is lost.

scanPIData

protected void scanPIData(String target,
                          org.apache.xerces.xni.XMLString data)
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans a processing data. This is needed to handle the situation where a document starts with a processing instruction whose target name starts with "xml". (e.g. xmlfoo) Note: This method uses fStringBuffer, anything in it at the time of calling is lost.
Parameters:
target - The PI target
data - The string to fill in with the data

scanPseudoAttribute

public String scanPseudoAttribute(boolean scanningTextDecl,
                                  org.apache.xerces.xni.XMLString value)
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans a pseudo attribute.
Parameters:
scanningTextDecl - True if scanning this pseudo-attribute for a TextDecl; false if scanning XMLDecl. This flag is needed to report the correct type of error.
value - The string to fill in with the attribute value.
Returns:
The name of the attribute Note: This method uses fStringBuffer2, anything in it at the time of calling is lost.

scanPubidLiteral

protected boolean scanPubidLiteral(org.apache.xerces.xni.XMLString literal)
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans public ID literal. [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%] The returned string is normalized according to the following rule, from http://www.w3.org/TR/REC-xml#dt-pubid: Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed.
Parameters:
literal - The string to fill in with the public ID literal.
Returns:
True on success. Note: This method uses fStringBuffer, anything in it at the time of calling is lost.

scanSurrogates

protected boolean scanSurrogates(XMLStringBuffer buf)
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans surrogates and append them to the specified buffer.

Note: This assumes the current char has already been identified as a high surrogate.

Parameters:
buf - The StringBuffer to append the read surrogates to.
Returns:
True if it succeeded.

scanXMLDeclOrTextDecl

protected void scanXMLDeclOrTextDecl(boolean scanningTextDecl,
                                     String[] pseudoAttributeValues)
            throws IOException,
                   org.apache.xerces.xni.XNIException
Scans an XML or text declaration.

 [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
 [24] VersionInfo ::= S 'version' Eq (' VersionNum ' | " VersionNum ")
 [80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' |  "'" EncName "'" )
 [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
 [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'")
                 | ('"' ('yes' | 'no') '"'))

 [77] TextDecl ::= '<?xml' VersionInfo? EncodingDecl S? '?>'
 
Parameters:
scanningTextDecl - True if a text declaration is to be scanned instead of an XML declaration.
pseudoAttributeValues - An array of size 3 to return the version, encoding and standalone pseudo attribute values (in that order). Note: This method uses fString, anything in it at the time of calling is lost.

setFeature

public void setFeature(String featureId,
                       boolean value)
            throws org.apache.xerces.xni.parser.XMLConfigurationException
Specified by:
setFeature in interface org.apache.xerces.xni.parser.XMLComponent

setProperty

public void setProperty(String propertyId,
                        Object value)
            throws org.apache.xerces.xni.parser.XMLConfigurationException
Sets the value of a property during parsing.
Specified by:
setProperty in interface org.apache.xerces.xni.parser.XMLComponent
Parameters:
propertyId -
value -

startEntity

public void startEntity(String name,
                        org.apache.xerces.xni.XMLResourceIdentifier identifier,
                        String encoding,
                        org.apache.xerces.xni.Augmentations augs)
            throws org.apache.xerces.xni.XNIException
This method notifies of the start of an entity. The document entity has the pseudo-name of "[xml]" the DTD has the pseudo-name of "[dtd]" parameter entity names start with '%'; and general entities are just specified by their name.
Parameters:
name - The name of the entity.
identifier - The resource identifier.
encoding - The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).
augs - Additional information that may include infoset augmentations
Throws:
org.apache.xerces.xni.XNIException - Thrown by handler to signal an error.

versionSupported

protected boolean versionSupported(String version)

Copyright B) 1999-2005 Apache XML Project. All Rights Reserved.