Class TextFormat.Tokenizer

  • Enclosing class:
    TextFormat

    private static final class TextFormat.Tokenizer
    extends java.lang.Object
    Represents a stream of tokens parsed from a String.

    The Java standard library provides many classes that you might think would be useful for implementing this, but aren't. For example:

    • java.io.StreamTokenizer: This almost does what we want -- or, at least, something that would get us close to what we want -- except for one fatal flaw: It automatically un-escapes strings using Java escape sequences, which do not include all the escape sequences we need to support (e.g. '\x').
    • java.util.Scanner: This seems like a great way at least to parse regular expressions out of a stream (so we wouldn't have to load the entire input into a single string before parsing). Sadly, Scanner requires that tokens be delimited with some delimiter. Thus, although the text "foo:" should parse to two tokens ("foo" and ":"), Scanner would recognize it only as a single token. Furthermore, Scanner provides no way to inspect the contents of delimiters, making it impossible to keep track of line and column numbers.

    Luckily, Java's regular expression support does manage to be useful to us. (Barely: We need Matcher.usePattern(), which is new in Java 1.5.) So, we can use that, at least. Unfortunately, this implies that we need to have the entire input in one contiguous string.

    • Field Detail

      • text

        private final java.lang.CharSequence text
      • matcher

        private final java.util.regex.Matcher matcher
      • currentToken

        private java.lang.String currentToken
      • pos

        private int pos
      • line

        private int line
      • column

        private int column
      • previousLine

        private int previousLine
      • previousColumn

        private int previousColumn
      • WHITESPACE

        private static final java.util.regex.Pattern WHITESPACE
      • TOKEN

        private static final java.util.regex.Pattern TOKEN
      • DOUBLE_INFINITY

        private static final java.util.regex.Pattern DOUBLE_INFINITY
      • FLOAT_INFINITY

        private static final java.util.regex.Pattern FLOAT_INFINITY
      • FLOAT_NAN

        private static final java.util.regex.Pattern FLOAT_NAN
    • Constructor Detail

      • Tokenizer

        private Tokenizer​(java.lang.CharSequence text)
        Construct a tokenizer that parses tokens from the given text.