7July_19_2008
Verifying Web Applications Using Bounded Model Checking--revisited
BMC: sections of code patched with runtime guards
3 benefits of MC:
1. counterexamples provided
2. more precise models
3. sound and complete verification (compared to conventional mc's)
more practical approach, requires fixed program diameters to be complete.
-secure flow information prob w/ fixed prog. diameter
counter -examples grouped
1. more descriptive and precise error reports
2. automated patching at locations where errors are initially introduced rather
than where symptoms occur
1. INTRODUCTION
deployment of mechanism for assuring web App
-apply to existing web app w/ fault tolerance
-gateways/filtering at app level
1. immediate assurance, but blind protection
previous models:
-user-experience model
-user-behaviour model
both black box tests, assessment over blind assurance
do not proved immediate quality assurance and can't guarantee id of all flaws
static runtime WebSSARI, using static and runtime techniques (Web App Security
via static analysis and runtime inspection)
a)statically verify existing web app code w/o annotation effort
b)automatically protect potentially defective sections of code after
verification
TS based on Strom and Yemini typestate
-compile-time technique verify program reliability
Formal verification algorithms for reliability and security ->
similar/identical sets, program characteristics
detect at compile-time syntactically legal --semantics undefined that cause
unpredictable behaviour
1. reading vars before init
2. dereferencing ptrs after deallocation
null-ptr ref=>d.o.s.
strom and Yemini, flow sensitive tracking of program var states
each appearance of a var at a pt determines subset of allowable operations in
that specific context.
verification on BMC, precise compile-time estimation of runtime state, better
than TS
verify security vulnerabilities not reliability attributes
WebSSARI runtime guards in potentially insecure spots of code (no programmer
intervention)
TS identified pts violating safety policy drawbacks:
1. looked at symptom spots, not necessarily causes
one problem could snowball into many symptoms of
violations, exponentially
then one instance of sanitized variable per variable
used needed
better soln: use one counter-example to get to root
problem
2. difficult for programmer to validate error, can manifest in many
calls
(considerable drawback for practical use)
simple strategy fro counter-example generation to use model checking, but they
are expensive
so BMC uses zChaff (practical, industry-used SAT solver)
counterexample allow for more precise identification of both the error
report and runtime guard process
2. WEB APP VULNERABILITIES
arise from use of unsanitized data
auto patch - insert sanitization routines
use counter-example traces to reduce the number of insertions needed.
2.1 CROSS SITE SCRIPTING (XSS)
uploading date by user, stored, used later
server side script retrieves date from back-end db and constructs HTML
output
ex.
2.2 SQL INJECTION
more severe than XSS, untrusted values used to construct SQL query
commands, arbitrary commands
ex.
2.3 SPECIFYING WEB APPLICATION RELIABILITY AND SECURITY
POLICIES
compromises in integrity => comprises in confidentiality and availability
need a mechanism that specifying and enforcing info flow policies w/in web apps
use a "state" that represents var's current trust level
design a compile-time algorithm to predict var runtime states at each prog
pt. (similar to enforcing policies)
*TIE TO SQL POLICIES?
uninit vars and illegal ptr deref --compile-time tracking of var
states.
*POLICIES BY USER?
3. VERIFICATION
takes advantage of chars script langs such as PHP, ASP, Perl and
Python ... imperative, deterministic and sequental
info flow model --each prog var x w/ safety type tx = current safety of level
of x.
1st abstract interpretation (AI) of prog, info flow properties
2nd use bounded model checking to verify correctness of all possible states of
AI
3.1 INFORMATION FLOW MODEL
follows Denning's model plus assumptions:
1. Each variable is associated w/ safety type
2. T = {t1,t2,...,tn} t-finite set of safety types
3. T is partially ordered reflexive, transitive, antisymmetric
4. ordering forms complete lattice w/ lower bound and upper bound
greatest lower bound (glb) and least upper bound (lub)
types determined using upper and lower bound operators
3.2 ABSTRACT INTERTPRETATION
p-program F(p)-filter command sequences according to following
rules:
x is var, n is constant, ~ binary op
preserving only assignments, function calls & conditions, F(p) unfolds
function calls keeps only info flow.
data from external sources untrusted input channels (UIC) fi(X) (ex: PHP
GET_HTTP_VARS())
WebSSARI, UIC given predefined postconditions consisting of command sets
that match designated safety levels of retrieved data
calling functions that manipulate sys resources or affect sys integrity
in PHP, exec() echo(), etc... sensitive output channels (SOC) fo(X) require
trusted args
stop: terminates prog execution
not on e's evaluation, bur on each branched path behaves correctly (all
conditions as nondeterministic conds)
info flow is considered, so loops become selection
given safety type lattice T, pre- and post- conditions, F(p) to AI :
if-instructions, type assignments and assertions
AI(F(p)) in fig. 4
expression e to var x: type assignment e's safety type
function preconditions asserts -- all vars in x lower (safer) than
tr.
*PRECONDITION SAFETY
postconditions describe safety of each retrieved
data
*POSTCONDITION SAFETY (POST SQL??)
pre and pos-conditions stored in loadable files (see sect.
4)
*STATIC? STORED.
3.3 FORMAL VERIFICATION
using AI def:
check that AI(F(p)) is consistent w/ assertions
1. loop-free flowchart-> DAG => fixed program diameter
2. single sequential process with large number of vars and branches
BMC (not BDD's)
sound and complete
finds bugs efficiently
handles large # of vars
BMC transition relations unfolded w/ bounded steps, put w/ initial and risk
conditions to form CNF formula, solved using SAT solver
3.3.1 Encoding using an auxiliary variable
straightforward sol'n --add aux var l to record program lines
(statements)
program p, let X = l U {tx | x dom(AI (F( p)))}
construct a control-flow graph CFG(X,p), where transition relations encoded as
CNF formula, T(s,s’),
By rolling T(si, si+1) for bounded k steps (length of longest path in
CFG)
I(s0) is the initial condition and R(si,...,sk) specifies
risk condition (assetion negations) w/in ith and kth states.
incorporated this idea into first BMC version, xBMC0.1
initial experiments revealed frequent system breakdowns, due to
inefficient encoding of each assignment ( 2|X| vars)
3.3.2 Encoding using var renaming
automated memory overflow and assertion consistency test for C and
verilog w/ CBMC (clarke, et. al.)
checker unwinds C or Verilog programs and converts to Boolean formula,
then checked for consistency
CBMC uses variable renaming to create SSA Static Single Assignment w/o
phi-condition
(SSA form is an intermediate representation (IR) in which every variable
is assigned exactly once--existing vars in original IR split into versions, new
vars typically w/ subscripts of original name...formally equivalent to CPS
continuation passing style... the phi-function comes into play
a node can strictly dominate (dominator) a different node in control flow
graph if reaching the second node depends completely on the first. dominance
frontier occurs when the first node partially affect the second. the
phi-function determines where flow came from because it affects the value at
that node, if more than one possibility the phi-function(created due to
dominance frontiers) determines it)
each assigned 2 vars can't have loops, so loop becomes selection, allowed in
information flow model.
clarke's alg can be used.
Alg:
AI var renamed s.t. each renamed var is assigned only once, w/
subscripts denoting number of assignments made to var prior(procedure p)
xBMC1.0 clarke's alg to encode AI: given command c, C(c,g) as below:
check one assertion at a time and generate all counterexamples per each
assertion, gives us formula B
check B for satisfiability.
if sat(B), counterexample and make B more restrictive by negating counter-ex
and iterating until B is unsat (all counter-ex's)
c;asserti (view assert and all preceeding commands)
Bi constructed w/ negation of asserti Bi :=C(c,g) ^ !C(asserti,g)
example:
CNF(Bi) transforms Bi into CNF formula solved using
efficient SAT solver zChaff
if CNF(Bi) is sat, ZChaff proposes a truth assignment ai violating asserti
BN is set of all nondeterministic boolean var in the AI
trace the AI according to BN values in ai, get a sequence of single assignments
= one counterexample trace
iteratively make Bi more restrictive until unsat to get all possible
counterexamples (negation clause of Ni^j of BN more restrictive formula Bi^j at
j+1st iteration is
once unsat, continue constraint generation procedure (fig
5) C(c,g):=C(c1,g) ^ C(asserti,g) until next assertion
AI loop free => each assertion one
time
*WHAT ABOUT LOOPS?
3.3.3 Counterexample Analysis
For any instance of unsafe code, insert a guard taht does runt-time inspections
as part of patching process
(inserts routines that sanitize untrusted input before used by sensitive
output channels SOCs)
alg. takes advantage of BMC-produced counter-ex's to id optimal patching
fix def: given an error trace r, a Fix(V) is effective if, after sanitizing all
var v in V, the error trace is removed (fixed)
set of vars that directly caused assertion violations = violating
variables
given error trace set R, find a minimal fixing set VR
surely fixing them all would work, but fixing the minimum is more
efficient
so, for each violating variable va in VR, a replacement set sva is built by
tracing error trace back from violation while recursively adding vars
that are unique r-values of single assignments
if va is tainted, then tainted path flow, so tracing back
sva is expanded w/ var that can be sanitized instead of va, but still fixing
problem
Lemmas and proofs that this effective fix....
s.t. ![]()
offer formal def that is NP-complete (heuristic search)
MINIMUM-INTERSECTING SET problem find a min set reduced to SET-COVER so use
greedy heuristic: from paper:

Code walker has
lexer, parser AST (abstract syntax tree) maker, prog
abstractor
*AST
prog abstractor asks AST maker to generate a full representation of
prog's AST
AST Maker uses p=lexer and parser to perform this task, handle external
files as it does
traversing the AST, prog abstractor creates AI
BMC engine performs verification of AI
for each var of insecure statement, inserts a statement that will sanitize w/
routine
sanitization routines stored in prelude, users can supply own
routines
*USER INPUT
process in fig 9.
instrumentor = fixing/patching ?
5. EXPERIMENTAL RESULTS
230 projects (PHP)
69 w/ vulnerabilities (30%)
added to GUI: a) navigate bet. source files
b) id particular vars(highlighting)
c) search for vars text patterns
added tool PHPXREF to generate cross-referenced HTML docs of source code
even so 2 people 4 days to validate 515 files that were vulnerable
6. DISCUSSION
BMC-provided counter-examples w/ info flow so
localizing needed a MINIMUM-INTERSECTING SET to get all counter-examples
(SAT-based)
Ball, Naik, Rajamni: SLAM model checker localize errors (report symptoms not
causes) (BDD-based)
they used C-device drivers locking bugs as test
(one-to-one mapping symptom to cause)
State-of-the-art model checkers today reoport only single error trace per run
7. CONCLUSION
practical approach for formal verification of Web app reliability and
security.
TS (breadth first search--trades space for time) no counter-example traces
depth first search for WebSSARI would have been too time costly so...
BMC w/ zChaff (mature SAT solver)
2 benefits of traces
a. allow for more info on error reports
b. can be used to identify many errors (symptoms) w/ same root cause
=> greater report accuracy and reduces # of inserted guards.
find min. errors is NP-complete, there is greedy heuristic-based strategy.