2 \page data-structures-core-structures-and-ast Data structures: core structures and AST
4 \author Martin Brain, Peter Schrammel, Owen Jones
6 ## Strings: dstringt, the string_container and the ID_* ##
7 Within cbmc, strings are represented using `irep_idt`. By default this is
8 typedefed to \ref dstringt, which stores a string as an index into a large
9 static table of strings. This makes it easy to compare if two `irep_idt`s
10 are equal (just compare the index) and it makes it efficient to store
11 many copies of the same string. The static list of strings is initially populated
12 from `irep_ids.def`, so for example the fourth entry in `irep_ids.def` is
13 “IREP_ID_ONE(type)”, so the string “type” has index 3. You can refer to
14 this `irep_idt` as `ID_type`. The other kind of line you see is
15 “IREP_ID_TWO(C_source_location, #source_location)”, which means the
16 `irep_idt` for the string “#source_location” can be referred to as
17 `ID_C_source_location`. The “C” is for comment, which will be explained
18 in the next section. Any strings that need to be stored as `irep_id`s which
19 aren't in `irep_ids.def` are added to the end of the table when they are
20 first encountered, and the same index is used for all instances.
22 See documentation at \ref dstringt.
24 ## irept: a 4-triple (data, named-sub, comments, sub) ##
25 See documentation at \ref irept.
27 As that documentation says, `irept`s are generic tree nodes. You should
28 think of them as having a single string (data, actually an `irep_idt`) and
29 lots of child nodes, some of which are numbered (sub) and some of which are
30 labelled, and the label can either start with a “#” (comments-sub) or without
31 one (named-sub). The meaning of the “#” is that this child should not be
32 considered important, for example it shouldn’t be counted when comparing two
33 `irept`s for equality.
44 \ref exprt is the class to represent an expression. It inherits from \ref irept,
45 and the only things it adds to it are that every \ref exprt has a named sub
46 containing its type and everything in the sub of an \ref exprt is again an
47 \ref exprt, not just an \ref irept. You can think of \ref exprt as a node in the
48 abstract syntax tree for an expression. There are many classes that
49 inherit from \ref exprt and which represent more specific things. For
50 example, there is \ref minus_exprt, which has a sub of size 2 (for the two
53 Recall that every \ref irept has one piece of data of its own, i.e. its
54 `id()`, and all other information is in its `named_sub`, `comments` or
55 `sub`. For `exprt`s, the `id()` is used to specify why kind of \ref exprt this is,
56 so a \ref minus_exprt has `ID_minus` as its `id()`. This means that a
57 \ref minus_exprt can be passed wherever an \ref exprt is expected, and if you
58 want to check if the expression you are looking at is a minus
59 expression then you have to check its `id()` (or use
60 `can_cast_expr<minus_exprt>`).
63 \ref exprt represents expressions and \ref codet represents statements. \ref codet
64 inherits from \ref exprt, so all `codet`s are `exprt`s, with `id()` `ID_code`.
65 Many different kinds of statements inherit from \ref codet, and they are
66 distinguished by their `statement()`. For example, a while loop would be
67 represented by a \ref code_whilet, which has `statement()` `ID_while`.
68 \ref code_whilet has two operands in its sub, and helper functions to make
69 it easier to use: `cond()` returns the first subexpression, which
70 is the condition for the while loop, and `body()` returns the second
71 subexpression, which is the body of the while loop - this has to be
72 a \ref codet, because the body of a while loop is a statement.
74 ## symbolt, symbol_table, and namespacet ##
78 ### Symbol lifetimes, symbol modes, name, base-name, pretty-name; semantic of lifetimes for symex? ###
82 ### Storing symbols and hiding symbols (namespacet) ###
90 ## Examples: how to represent the AST of statements, in C and in java ##
96 ### if (x > 10) { y = 2 } else { v[3] = 4 } ###
100 ### Java arrays: struct Array { int length, int *data }; ###