Class | CodeRay::Tokens |
In: |
lib/coderay/token_classes.rb
lib/coderay/tokens.rb |
Parent: | Object |
The Tokens class represents a list of tokens returnd from a Scanner.
A token is not a special object, just a two-element Array consisting of
A token looks like this:
[:comment, '# It looks like this'] [:float, '3.1415926'] [:error, 'äöü']
Some scanners also yield some kind of sub-tokens, represented by special token texts, namely :open and :close .
The Ruby scanner, for example, splits "a string" into:
[ [:open, :string], [:delimiter, '"'], [:content, 'a string'], [:delimiter, '"'], [:close, :string] ]
Tokens is also the interface between Scanners and Encoders: The input is split and saved into a Tokens object. The Encoder then builds the output from this object.
Thus, the syntax below becomes clear:
CodeRay.scan('price = 2.59', :ruby).html # the Tokens object is here -------^
See how small it is? ;)
Tokens gives you the power to handle pre-scanned code very easily: You can convert it to a webpage, a YAML file, or dump it into a gzip‘ed string that you put in your DB.
Tokens’ subclass TokenStream allows streaming to save memory.
ClassOfKind | = | Hash.new do |h, k| h[k] = k.to_s |
Escapes a string for use in write_token.
# File lib/coderay/tokens.rb, line 79 79: def escape text 80: text.gsub(/[\n\\]/, '\\\\\&') 81: end
Undump the object using Marshal.load, then unzip it using GZip.gunzip.
The result is commonly a Tokens object, but this is not guaranteed.
# File lib/coderay/tokens.rb, line 304 304: def Tokens.load dump 305: require 'coderay/helpers/gzip_simple' 306: dump = dump.gunzip 307: @dump = Marshal.load dump 308: end
Read a token from the string.
Inversion of write_token.
TODO Test this!
# File lib/coderay/tokens.rb, line 69 69: def read_token token 70: type, text = token.split("\t", 2) 71: if type[0] == ?: 72: [text.to_sym, type[1..-1].to_sym] 73: else 74: [type.to_sym, unescape(text)] 75: end 76: end
Convert the token to a string.
This format is used by Encoders.Tokens. It can be reverted using read_token.
# File lib/coderay/tokens.rb, line 56 56: def write_token text, type 57: if text.is_a? String 58: "#{type}\t#{escape(text)}\n" 59: else 60: ":#{text}\t#{type}\t\n" 61: end 62: end
Dumps the object into a String that can be saved in files or databases.
The dump is created with Marshal.dump; In addition, it is gzipped using GZip.gzip.
The returned String object includes Undumping so it has an undump method. See Tokens.load.
You can configure the level of compression, but the default value 7 should be what you want in most cases as it is a good compromise between speed and compression rate.
See GZip module.
# File lib/coderay/tokens.rb, line 263 263: def dump gzip_level = 7 264: require 'coderay/helpers/gzip_simple' 265: dump = Marshal.dump self 266: dump = dump.gzip gzip_level 267: dump.extend Undumping 268: end
Iterates over all tokens.
If a filter is given, only tokens of that kind are yielded.
# File lib/coderay/tokens.rb, line 100 100: def each kind_filter = nil, &block 101: unless kind_filter 102: super(&block) 103: else 104: super() do |text, kind| 105: next unless kind == kind_filter 106: yield text, kind 107: end 108: end 109: end
Iterates over all text tokens. Range tokens like [:open, :string] are left out.
Example:
tokens.each_text_token { |text, kind| text.replace html_escape(text) }
# File lib/coderay/tokens.rb, line 116 116: def each_text_token 117: each do |text, kind| 118: next unless text.is_a? ::String 119: yield text, kind 120: end 121: end
Encode the tokens using encoder.
encoder can be
options are passed to the encoder.
# File lib/coderay/tokens.rb, line 131 131: def encode encoder, options = {} 132: unless encoder.is_a? Encoders::Encoder 133: unless encoder.is_a? Class 134: encoder_class = Encoders[encoder] 135: end 136: encoder = encoder_class.new options 137: end 138: encoder.encode_tokens self, options 139: end
Ensure that all :open tokens have a correspondent :close one.
TODO: Test this!
# File lib/coderay/tokens.rb, line 202 202: def fix 203: tokens = self.class.new 204: # Check token nesting using a stack of kinds. 205: opened = [] 206: for type, kind in self 207: case type 208: when :open 209: opened.push [:close, kind] 210: when :begin_line 211: opened.push [:end_line, kind] 212: when :close, :end_line 213: expected = opened.pop 214: if [type, kind] != expected 215: # Unexpected :close; decide what to do based on the kind: 216: # - token was never opened: delete the :close (just skip it) 217: next unless opened.rindex expected 218: # - token was opened earlier: also close tokens in between 219: tokens << token until (token = opened.pop) == expected 220: end 221: end 222: tokens << [type, kind] 223: end 224: # Close remaining opened tokens 225: tokens << token while token = opened.pop 226: tokens 227: end
Redirects unknown methods to encoder calls.
For example, if you call +tokens.html+, the HTML encoder is used to highlight the tokens.
# File lib/coderay/tokens.rb, line 154 154: def method_missing meth, options = {} 155: Encoders[meth].new(options).encode_tokens self 156: end
Returns the tokens compressed by joining consecutive tokens of the same kind.
This can not be undone, but should yield the same output in most Encoders. It basically makes the output smaller.
Combined with dump, it saves space for the cost of time.
If the scanner is written carefully, this is not required - for example, consecutive //-comment lines could already be joined in one comment token by the Scanner.
# File lib/coderay/tokens.rb, line 169 169: def optimize 170: print ' Tokens#optimize: before: %d - ' % size if $DEBUG 171: last_kind = last_text = nil 172: new = self.class.new 173: for text, kind in self 174: if text.is_a? String 175: if kind == last_kind 176: last_text << text 177: else 178: new << [last_text, last_kind] if last_kind 179: last_text = text 180: last_kind = kind 181: end 182: else 183: new << [last_text, last_kind] if last_kind 184: last_kind = last_text = nil 185: new << [text, kind] 186: end 187: end 188: new << [last_text, last_kind] if last_kind 189: print 'after: %d (%d saved = %2.0f%%)' % 190: [new.size, size - new.size, 1.0 - (new.size.to_f / size)] if $DEBUG 191: new 192: end
Makes sure that:
This makes it simple for encoders that work line-oriented, like HTML with list-style numeration.
# File lib/coderay/tokens.rb, line 240 240: def split_into_lines 241: raise NotImplementedError 242: end
# File lib/coderay/tokens.rb, line 244 244: def split_into_lines! 245: replace split_into_lines 246: end
Whether the object is a TokenStream.
Returns false.
# File lib/coderay/tokens.rb, line 93 93: def stream? 94: false 95: end
The total size of the tokens. Should be equal to the input size before scanning.
# File lib/coderay/tokens.rb, line 284 284: def text 285: map { |t, k| t if t.is_a? ::String }.join 286: end
The total size of the tokens. Should be equal to the input size before scanning.
# File lib/coderay/tokens.rb, line 273 273: def text_size 274: size = 0 275: each_text_token do |t, k| 276: size + t.size 277: end 278: size 279: end
Turn into a string using Encoders::Text.
options are passed to the encoder if given.
# File lib/coderay/tokens.rb, line 145 145: def to_s options = {} 146: encode :text, options 147: end