cprover
|
This directory contains a symbolic evaluation system for goto-programs. This takes a goto-program and translates it to an equation system by traversing the program, branching and merging and unwinding loops as needed. Each reverse goto has a separate counter (the actual counting is handled by cbmc
, see the –unwind
and –unwind-set
options). When a counter limit is reach, an assertion can be added to explicitly show when analysis is incomplete. The symbolic execution includes constant folding so loops that have a constant number of iterations will be handled completely (assuming the unwinding limit is sufficient).
The output of the symbolic execution is a system of equations; an object containing a list of symex_target_elements
, each of which are equalities between expr
expressions. See symex_target_equation.h
. The output is in static, single assignment (SSA) form, which is not the case for goto-programs.
In the goto-symex directory.
Key classes:
The bmct class gets a reference to an equation (initially, an empty list of single-static assignment steps) and a goto-program from the frontend. The bmct creates a goto_symext to symbolically execute the goto-program, thereby filling the equation, which can then be passed along to the SAT solver.
The symbolic execution state is maintained by goto_symex_statet. This is a memory-heavy data structure, so goto_symext creates it on-demand and lets it go out-of-scope as soon as possible. However, the process of symbolic execution fills out an additional symbol table of dynamically-created objects; this symbol table is needed when solving the equation. This symbol table must thus be exported out of the state before it is torn down; this is done through the parameter "`new_symbol_table`" passed as a reference to the various functions in goto_symext.
The main symbolic execution loop code is goto_symext::symex_step(). This function case-switches over the type of the instruction that we're currently executing, and calls various other functions appropriate to the instruction type, i.e. goto_symext::symex_function_call() if the current instruction is a function call, goto_symext::symex_goto() if the current instruction is a goto, etc.
By default, CBMC symbolically executes the entire program, building up an equation representing all instructions, which it then passes to the solver. Notably, it merges paths that diverge due to a goto instruction, forming a constraint that represents having taken either of the two paths; the code for doing this is goto_symext::merge_goto().
CBMC can operate in a different mode when the --paths
flag is passed on the command line. This disables path merging; instead, CBMC executes a single path at a time, calling the solver with the equation representing that path, then continues to execute other paths. The 'other paths' that would have been taken when the program branches are saved onto a workqueue so that CBMC can continue to execute the current path, and then later retrieve saved paths from the workqueue. Implementation-wise, bmct passes a worklist to goto_symext in bmct::do_language_agnostic_bmc(). If path exploration is enabled, goto_symext will fill up the worklist whenever it encounters a branch, instead of merging the paths on the branch. After the initial symbolic execution (i.e. the first call to bmct::run() in bmct::do_language_agnostic_bmc()), bmct continues popping the worklist and executing untaken paths until the worklist empties. Note that this means that the default model-checking behaviour is a special case of path exploration: when model-checking, the initial symbolic execution run does not add any paths to the workqueue but rather merges all the paths together, so the additional path-exploration loop is skipped over.
In goto-programs, variable names get a prefix to indicate their scope (like main::1::foo
or whatever). At symbolic execution level, variables also get a suffix because we’re doing single-static assignment. There are three “levels” of renaming. At Level 2, variables are renamed every time they are encountered in a function. At Level 1, variables are renamed every time the functions that contain those variables are called. At Level 0, variables are renamed every time a new thread containing those variables are spawned. We can inspect the renamed variable names with the –show-vcc flag, which prints a string with the following format: %s!%d@%d#%d
. The string field is the variable name, and the numbers after the !, @, and # are the L0, L1, and L2 suffixes respectively. The following examples illustrate Level 1 and 2 renaming:
$ cat l1.c int main() { int x=7; x=8; assert(0); } $ cbmc --show-vcc l1.c ... {-12} x!0@1#2 == 7 {-13} x!0@1#3 == 8
That is, the L0 names for both xs are 0; the L1 names for both xs are 1; but each occurrence of x within main() gets a fresh L2 suffix (2 and 3, respectively).
$ cat l2.c void foo(){ int x=7; x=8; x=9; } int main(){ foo(); foo(); assert(0); } $ cbmc --show-vcc l2.c ... {-12} x!0@1#2 == 7 {-13} x!0@1#3 == 8 {-14} x!0@1#4 == 9 {-15} x!0@2#2 == 7 {-16} x!0@2#3 == 8 {-17} x!0@2#4 == 9
That is, every time we enter function foo, the L1 counter of x is incremented (from 1 to 2) and the L0 counter is reset (back to 2, after having been incremented up to 4). The L0 counter then increases every time we access x as we walk through the function.
In the goto-symex directory.
Key classes: