7July_03_2008 Eliminating SQL Injection Attacks - A Transparent Defense Mechanism

static information with runtime comparison
static info: certain locations of the target program as hotspots ( point in the app code that issues SQL to db)
each hotspot has a model built to represent all the possible SQL queries
string analysis uses the SOOT Framework
hotspots in Java program,
track String, StringBuffer and multidimensional String Arrays construct a flow graph
nodes in flow graph represent expressions, edges represent directed def-use relationships
 string manipulation methods represented as operators

"4.1.1 SQL Finite State Machine
The final result of the string analysis is a Non-Deterministic
Finite State Automaton (NDFA) that expresses all the possible
values a particular string can assume using single character
transitions in the automaton. We now create a SQL Finite
State Machine (SQL-FSM) by performing a depth first
traversal of the NDFA for that hotspot and grouping characters
as either SQL keywords, operators, or literal values, and
creating transitions that are annotated by the literal values
(tokens). Each SQL keyword is represented as is, while the
user input string variables are represented as VAR, indicating
that they can change at runtime. Fig. 2 shows the NDFA
and SQL-FSM for the first hotspot in the sample code. Note
that in the general case, both the NDFA and SQL-FSM can
have multiple non-looping branches, indicating possible execution
of multiple SQL queries at a single hotspot.

4.1.2 SQL-graph Representation
We can thus construct a SQL-FSM for each of the hotspots
in the program. These data structures now capture the semantics
of the different SQL queries that are to be sent to
the database at runtime. Any user input would be compared
against this template and any change in the SQL-FSM structure
would indicate a possible SQLIA.We note that running
each and every query under the scanner at runtime could be
an expensive process. Given that the user input would realistically
consist of a few strings only but the number of SQL
queries that get executed in a program could be very large,
we now try to optimize number of queries that need to be
put under the scanner during runtime to ensure the validity
of dynamically generated queries, using a SQL-graph.


 



Figure 2. NDFA and SQL-FSM for Hotspot
The SQL-graph in Fig. 3 represents 4 different SQL queries
in the program as nodes within a logical boundary, and 3
different user inputs as being outside the logical boundary.
If a particular user input (I) is used in a SQL query (Q), the
relationship (R) between the two nodes is indicated by an
undirected link between the 2 nodes. We now define dependencies
(D) in the SQL-graph as links that point from one
SQL query to another SQL query such that the user inputs
used by the former is a proper superset of the user inputs
used by the latter. For SQL queries that use the same set of
user inputs, one of them is chosen as a representative query
and is made to point to the others. We see the dependencies
represented as directed arrows in the SQL-graph. Drawing
equivalence to Code 1, Q1, Q2 and Q3 represent the 3
different SQL queries (also the 3 different hotspots in this
case), while I1 and I2 represent the user inputs login and
pass. Q4 and I3 could possibly correspond to some other
hotspot in the program not represented in the code snippet.


 



Figure 3. SQL-graph Representation
The concept of SQL-graph is used to reduce runtime scanning
overhead by restricting the number of queries that need
to be scanned along any execution path that is taken in the
program. SQL queries that do not use user inputs are not
included in the SQL-graph. Only the SQL queries that are
exposed to the user inputs in some form or the other (string


"


 




Fig. 4 shows the case where an SQLIA is
not caused and the query is passed through. Also, it shows
the second example where an SQLIA has been caused and
hence gets rejected as a potentially malicious query. The
literals along both the static SQL-FSM and the runtime
SQL-FSM, as one traverses from the Start node to the End
node, should be identical. The other check that can be enforced
is that the length of the SQL-FSM chain for a particular
instance is exactly the same for the static and runtime
SQL-FSMs. Thus SQLIAs employing tautologies and injecting
additional statements can be captured by this technique.