net.htmlparser.jericho
Class StartTagType

java.lang.Object
  extended by TagType
      extended by StartTagType
Direct Known Subclasses:
StartTagTypeGenericImplementation

public abstract class StartTagType
extends TagType

Defines the syntax for a start tag type.

A start tag type is any TagType that starts with the character '<' (as with all tag types), but whose second character is not '/'.

This includes types for many tags which stand alone, without a corresponding end tag, and would not intuitively be categorised as a "start tag". For example, an HTML comment in a document is represented as a single start tag that spans the whole comment, and does not have an end tag at all.

The singleton instances of all the standard start tag types are available in this class as static fields.

Because all StartTagType instaces must be singletons, the '==' operator can be used to test for a particular tag type instead of the equals(Object) method.

See Also:
EndTagType

Field Summary
static StartTagType CDATA_SECTION
          The tag type given to a CDATA section (<![CDATA[ ... ]]>).
static StartTagType COMMENT
          The tag type given to an HTML comment (<!-- ... -->).
static StartTagType DOCTYPE_DECLARATION
          The tag type given to a document type declaration (<!DOCTYPE ... >).
static StartTagType MARKUP_DECLARATION
          The tag type given to a markup declaration (<!ELEMENT ... > | <!ATTLIST ... > | <!ENTITY ... > | <!NOTATION ... >).
static StartTagType NORMAL
          The tag type given to a normal HTML or XML start tag (<name ... >).
static StartTagType SERVER_COMMON
          The tag type given to a common server tag (<% ... %>).
static StartTagType SERVER_COMMON_ESCAPED
          The tag type given to an escaped common server tag (<\% ... %>).
static StartTagType UNREGISTERED
          The tag type given to an unregistered start tag (< ... >).
static StartTagType XML_DECLARATION
          The tag type given to an XML declaration (<?xml ... ?>).
static StartTagType XML_PROCESSING_INSTRUCTION
          The tag type given to an XML processing instruction (<?PITarget ... ?>).
 
Constructor Summary
protected StartTagType(java.lang.String description, java.lang.String startDelimiter, java.lang.String closingDelimiter, EndTagType correspondingEndTagType, boolean isServerTag, boolean hasAttributes, boolean isNameAfterPrefixRequired)
          Constructs a new StartTagType object with the specified properties.
 
Method Summary
 boolean atEndOfAttributes(Source source, int pos, boolean isClosingSlashIgnored)
          Indicates whether the specified source document position is at the end of a tag's attributes.
protected  StartTag constructStartTag(Source source, int begin, int end, java.lang.String name, Attributes attributes)
          Internal method for the construction of a StartTag object if this type.
 EndTagType getCorrespondingEndTagType()
          Returns the type of end tag required to pair with a start tag of this type to form an element.
 boolean hasAttributes()
          Indicates whether a start tag of this type contains attributes.
 boolean isNameAfterPrefixRequired()
          Indicates whether a valid XML tag name is required directly after the prefix.
protected  Attributes parseAttributes(Source source, int startTagBegin, java.lang.String tagName)
          Internal method for the parsing of Attributes.
 
Methods inherited from class TagType
constructTagAt, deregister, getClosingDelimiter, getDescription, getNamePrefix, getRegisteredTagTypes, getStartDelimiter, getTagTypesIgnoringEnclosedMarkup, isServerTag, isValidPosition, register, setTagTypesIgnoringEnclosedMarkup, tagEncloses, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

UNREGISTERED

public static final StartTagType UNREGISTERED
The tag type given to an unregistered start tag (< ... >).

See the documentation of the Tag.isUnregistered() method for details.

Properties:
PropertyValue
Descriptionunregistered
StartDelimiter<
ClosingDelimiter>
IsServerTagfalse
NamePrefix(empty string)
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
Example:
<"This is not recognised as any of the predefined tag types in this library">

See Also:
EndTagType.UNREGISTERED

NORMAL

public static final StartTagType NORMAL
The tag type given to a normal HTML or XML start tag (<name ... >).

Properties:
PropertyValue
Descriptionnormal
StartDelimiter<
ClosingDelimiter>
IsServerTagfalse
NamePrefix(empty string)
CorrespondingEndTagTypeEndTagType.NORMAL
HasAttributestrue
IsNameAfterPrefixRequiredtrue
Example:
<div class="NormalDivTag">


COMMENT

public static final StartTagType COMMENT
The tag type given to an HTML comment (<!-- ... -->).

An HTML comment is an area of the source document enclosed by the delimiters <!-- on the left and --> on the right.

The HTML 4.01 specification section 3.2.4 states that the end of comment delimiter may contain white space between the "--" and ">" characters, but this library does not recognise end of comment delimiters containing white space.

In the default configuration, any non-server tag appearing within an HTML comment is ignored by the parser. See the documentation of the tag parsing process for more information.

Properties:
PropertyValue
Descriptioncomment
StartDelimiter<!--
ClosingDelimiter-->
IsServerTagfalse
NamePrefix!--
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
Example:
<!-- This is a comment -->


XML_DECLARATION

public static final StartTagType XML_DECLARATION
The tag type given to an XML declaration (<?xml ... ?>).

An XML declaration is often referred to in texts as a special type of processing instruction with the reserved PITarget name of "xml". Technically it is not an XML processing instruction at all, but is still a type of SGML processing instruction.

According to section 2.8 of the XML 1.0 specification, a valid XML declaration can specify only "version", "encoding" and "standalone" attributes in that order. This library parses the attributes of an XML declaration in the same way as those of a normal tag, without checking that they conform to the specification.

Properties:
PropertyValue
DescriptionXML declaration
StartDelimiter<?xml
ClosingDelimiter?>
IsServerTagfalse
NamePrefix?xml
CorrespondingEndTagTypenull
HasAttributestrue
IsNameAfterPrefixRequiredfalse
Example:
<?xml version="1.0" encoding="UTF-8"?>


XML_PROCESSING_INSTRUCTION

public static final StartTagType XML_PROCESSING_INSTRUCTION
The tag type given to an XML processing instruction (<?PITarget ... ?>).

An XML processing instruction is a specific form of SGML processing instruction with the following two additional constraints:

This library does not include a predefined generic tag type for SGML processing instructions as the only forms in which they are found in HTML documents are the more specific XML processing instruction and the XML declaration, both of which have their own dedicated predefined tag type.

There is no restriction on the contents of an XML processing instruction. In particular, it can not be assumed that the processing instruction contains attributes, in contrast to the XML declaration.

Note that registering the PHPTagTypes.PHP_SHORT tag type overrides this tag type. This is because they both have the same start delimiter, so the one registered latest takes precedence over the other. See the documentation of the PHPTagTypes class for more information.

Properties:
PropertyValue
DescriptionXML processing instruction
StartDelimiter<?
ClosingDelimiter?>
IsServerTagfalse
NamePrefix?
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredtrue
Example:
<?xml-stylesheet href="standardstyle.css" type="text/css"?>


DOCTYPE_DECLARATION

public static final StartTagType DOCTYPE_DECLARATION
The tag type given to a document type declaration (<!DOCTYPE ... >).

Information about the document type declaration can be found in the HTML 4.01 specification section 7.2, and the XML 1.0 specification section 2.8.

The "!DOCTYPE" tag name is required to be in upper case in the source document, but all tag properties are stored in lower case because this library performs all parsing in lower case.

Properties:
PropertyValue
Descriptiondocument type declaration
StartDelimiter<!doctype
ClosingDelimiter>
IsServerTagfalse
NamePrefix!doctype
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
Example:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">


MARKUP_DECLARATION

public static final StartTagType MARKUP_DECLARATION
The tag type given to a markup declaration (<!ELEMENT ... > | <!ATTLIST ... > | <!ENTITY ... > | <!NOTATION ... >).

The name of a markup declaration tag is must be one of "!element", "!attlist", "!entity" or "!notation". These tag names are required to be in upper case in the source document, but all tag properties are stored in lower case because this library performs all parsing in lower case.

Markup declarations usually appear inside a document type definition (DTD), which is usually an external document to the HTML or XML document, but they can also appear directly within the document type declaration which is why they must be recognised by the parser.

Properties:
PropertyValue
Descriptionmarkup declaration
StartDelimiter<!
ClosingDelimiter>
IsServerTagfalse
NamePrefix!
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredtrue
Example:
<!ELEMENT BODY O O (%flow;)* +(INS|DEL) -- document body -->


CDATA_SECTION

public static final StartTagType CDATA_SECTION
The tag type given to a CDATA section (<![CDATA[ ... ]]>).

A CDATA section is a specific form of a marked section. This library does not include a predefined generic tag type for marked sections, as the only type of marked sections found in HTML documents are CDATA sections.

The HTML 4.01 specification section B.3.5 and the XML 1.0 specification section 2.7 contain definitions for a CDATA section.

There is inconsistency between the SGML and HTML/XML specifications in the definition of a marked section. SGML requires the presence of a space between the "<![" prefix and the keyword, and allows a space after the keyword. The XML specification forbids these spaces, and the examples given in the HTML specification do not include them either. This library only recognises CDATA sections that do not include the spaces.

The "![CDATA[" tag name is required to be in upper case in the source document according to the HTML/XML specifications, but all tag properties are stored in lower case because this makes it more efficient for the library to perform case-insensitive parsing of all tags.

In the default configuration, any non-server tag appearing within a CDATA section is ignored by the parser. See the documentation of the tag parsing process for more information.

Properties:
PropertyValue
DescriptionCDATA section
StartDelimiter<![cdata[
ClosingDelimiter]]>
IsServerTagfalse
NamePrefix![cdata[
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
Example:
This example shows the recommended practice of enclosing scripts inside a CDATA section:
<script type="text/javascript">
//<![CDATA[
function min(a,b) {return a<b ? a : b;}
//]]>
</script>


SERVER_COMMON

public static final StartTagType SERVER_COMMON
The tag type given to a common server tag (<% ... %>).

Common server tags include ASP, JSP, PSP, ASP-style PHP, eRuby, and Mason substitution tags.

This tag and the escaped common server tag are the only standard tag types that define server tags. They are included as standard tag types because of the common server tag's widespread use in many platforms, including those listed above.

Properties:
PropertyValue
Descriptioncommon server tag
StartDelimiter<%
ClosingDelimiter%>
IsServerTagtrue
NamePrefix%
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
Example:
<%@ include file="header.html" %>


SERVER_COMMON_ESCAPED

public static final StartTagType SERVER_COMMON_ESCAPED
The tag type given to an escaped common server tag (<\% ... %>).

Some of the platforms that support the common server tag also support a mechanism to escape that tag by adding a backslash (\) before the percent (%) character. Although rarely used, this tag type allows the parser to recognise these escaped tags in addition to the common server tag itself.

Properties:
PropertyValue
Descriptionescaped common server tag
StartDelimiter<\%
ClosingDelimiter%>
IsServerTagtrue
NamePrefix\%
CorrespondingEndTagTypenull
HasAttributesfalse
IsNameAfterPrefixRequiredfalse
Example:
<\%@ include file="header.html" %>

Constructor Detail

StartTagType

protected StartTagType(java.lang.String description,
                       java.lang.String startDelimiter,
                       java.lang.String closingDelimiter,
                       EndTagType correspondingEndTagType,
                       boolean isServerTag,
                       boolean hasAttributes,
                       boolean isNameAfterPrefixRequired)
Constructs a new StartTagType object with the specified properties.
(implementation assistance method)

As StartTagType is an abstract class, this constructor is only called from sub-class constructors.

Parameters:
description - a description of the new start tag type useful for debugging purposes.
startDelimiter - the start delimiter of the new start tag type.
closingDelimiter - the closing delimiter of the new start tag type.
correspondingEndTagType - the corresponding end tag type of the new start tag type.
isServerTag - indicates whether the new start tag type is a server tag.
hasAttributes - indicates whether the new start tag type has attributes.
isNameAfterPrefixRequired - indicates whether a name is required after the prefix.
Method Detail

getCorrespondingEndTagType

public final EndTagType getCorrespondingEndTagType()
Returns the type of end tag required to pair with a start tag of this type to form an element.
(property method)

This can be represented by the following expression that is always true given an arbitrary element that has an end tag:

element.getStartTag().getStartTagType().getCorrespondingEndTagType()==element.getEndTag().getEndTagType()

Standard Tag Type Values:
Start Tag TypeCorresponding End Tag Type
UNREGISTEREDnull
NORMALEndTagType.NORMAL
COMMENTnull
XML_DECLARATIONnull
XML_PROCESSING_INSTRUCTIONnull
DOCTYPE_DECLARATIONnull
MARKUP_DECLARATIONnull
CDATA_SECTIONnull
SERVER_COMMONnull
SERVER_COMMON_ESCAPEDnull
Extended Tag Type Values:
Start Tag TypeCorresponding End Tag Type
MicrosoftTagTypes.DOWNLEVEL_REVEALED_CONDITIONAL_COMMENTnull
PHPTagTypes.PHP_SCRIPTEndTagType.NORMAL
PHPTagTypes.PHP_SHORTnull
PHPTagTypes.PHP_STANDARDnull
MasonTagTypes.MASON_COMPONENT_CALLnull
MasonTagTypes.MASON_COMPONENT_CALLED_WITH_CONTENTMasonTagTypes.MASON_COMPONENT_CALLED_WITH_CONTENT_END
MasonTagTypes.MASON_NAMED_BLOCKMasonTagTypes.MASON_NAMED_BLOCK_END

Returns:
the type of end tag required to pair with a start tag of this type to form an Element.
See Also:
EndTagType.getCorrespondingStartTagType()

hasAttributes

public final boolean hasAttributes()
Indicates whether a start tag of this type contains attributes.
(property method)

The attributes start at the end of the name and continue until the closing delimiter is encountered. If the character sequence representing the closing delimiter occurs within a quoted attribute value it is not recognised as the end of the tag.

The atEndOfAttributes(Source, int pos, boolean isClosingSlashIgnored) method can be overridden to provide more control over where the attributes end.

Standard Tag Type Values:
Start Tag TypeHas Attributes
UNREGISTEREDfalse
NORMALtrue
COMMENTfalse
XML_DECLARATIONtrue
XML_PROCESSING_INSTRUCTIONfalse
DOCTYPE_DECLARATIONfalse
MARKUP_DECLARATIONfalse
CDATA_SECTIONfalse
SERVER_COMMONfalse
SERVER_COMMON_ESCAPEDfalse
Extended Tag Type Values:
Start Tag TypeHas Attributes
MicrosoftTagTypes.DOWNLEVEL_REVEALED_CONDITIONAL_COMMENTfalse
PHPTagTypes.PHP_SCRIPTtrue
PHPTagTypes.PHP_SHORTfalse
PHPTagTypes.PHP_STANDARDfalse
MasonTagTypes.MASON_COMPONENT_CALLfalse
MasonTagTypes.MASON_COMPONENT_CALLED_WITH_CONTENTfalse
MasonTagTypes.MASON_NAMED_BLOCKfalse

Returns:
true if a start tag of this type contains attributes, otherwise false.

isNameAfterPrefixRequired

public final boolean isNameAfterPrefixRequired()
Indicates whether a valid XML tag name is required directly after the prefix.
(property method)

If this property is true, the name of the tag consists of the prefix followed by an XML tag name.

If this property is false, the name of the tag consists of only the prefix.

Standard Tag Type Values:
Start Tag TypeName After Prefix Required
UNREGISTEREDfalse
NORMALtrue
COMMENTfalse
XML_DECLARATIONfalse
XML_PROCESSING_INSTRUCTIONtrue
DOCTYPE_DECLARATIONfalse
MARKUP_DECLARATIONtrue
CDATA_SECTIONfalse
SERVER_COMMONfalse
SERVER_COMMON_ESCAPEDfalse
Extended Tag Type Values:
Start Tag TypeName After Prefix Required
MicrosoftTagTypes.DOWNLEVEL_REVEALED_CONDITIONAL_COMMENTtrue
PHPTagTypes.PHP_SCRIPTfalse
PHPTagTypes.PHP_SHORTfalse
PHPTagTypes.PHP_STANDARDfalse
MasonTagTypes.MASON_COMPONENT_CALLfalse
MasonTagTypes.MASON_COMPONENT_CALLED_WITH_CONTENTfalse
MasonTagTypes.MASON_NAMED_BLOCKtrue

Returns:
true if a valid XML tag name is required directly after the prefix, otherwise false.

atEndOfAttributes

public boolean atEndOfAttributes(Source source,
                                 int pos,
                                 boolean isClosingSlashIgnored)
Indicates whether the specified source document position is at the end of a tag's attributes.
(default implementation method)

This method is called internally while parsing attributes to detect where they should end.

It can be assumed that the specified position is not inside a quoted attribute value.

The default implementation simply compares the parse text at the specified position with the closing delimiter, and is equivalent to:
source.getParseText().containsAt(getClosingDelimiter(),pos)

The isClosingSlashIgnored parameter is only relevant in the NORMAL start tag type, which makes use of it to cater for the '/' character that can occur before the closing delimiter in empty-element tags. It's value is always false when passed to other start tag types.

Parameters:
source - the Source document.
pos - the character position in the source document.
isClosingSlashIgnored - indicates whether the name of the start tag being tested is incompatible with an empty-element tag.
Returns:
true if the specified source document position is at the end of a tag's attributes, otherwise false.

constructStartTag

protected final StartTag constructStartTag(Source source,
                                           int begin,
                                           int end,
                                           java.lang.String name,
                                           Attributes attributes)
Internal method for the construction of a StartTag object if this type.
(implementation assistance method)

Intended for use from within the constructTagAt(Source, int pos) method.

Parameters:
source - the Source document.
begin - the character position in the source document where the tag begins.
end - the character position in the source document where the tag ends.
name - the name of the tag.
attributes - the attributes of the tag.
Returns:
the new StartTag object.

parseAttributes

protected final Attributes parseAttributes(Source source,
                                           int startTagBegin,
                                           java.lang.String tagName)
Internal method for the parsing of Attributes.
(implementation assistance method)

Intended for use from within the constructTagAt(Source, int pos) method.

The returned Attributes segment begins at startTagBegin+1+tagName.length(), and ends straight after the last attribute found before the tag's closing delimiter.

Only returns null if the segment contains a major syntactical error or more than the default maximum number of minor syntactical errors.

Parameters:
source - the Source document.
startTagBegin - the position in the source document at which the start tag is to begin.
tagName - the name of the start tag to be constructed.
Returns:
the Attributes of the start tag to be constructed, or null if too many errors occur while parsing.