7July_09_2009
Using parse tree Validation to prevent SQL injection attacks
http://portal.acm.org/citation.cfm?doid=1108473.1108496
Technique is based on run-time comparison of parse-tree of the SQL statement
before user input added and after input
adds 3 ms overhead (like SQLCHECK) to db query costs
implemented in simple static java class
1. web apps all over J2EE, Microsoft's ASP and ASP.NET, PHP, cgi.
TIERS
Presentation tier (web browser for i/o)
middle tier (business tier w/ business logic, requests
to db layer)
data tier (typically rdb, storage services)
user logins in webbrowswer, middle tier creates sql, db retrieves info,
middle tier creates session to send back to webbrowser (presentation tier)
SQL injection teching to break in. user input is parsed as SQL tokens, changing
the semantics of underlying query
this paper: novel runtime technique to eliminate SQL injection
all SQL injections alter the structure of the query intended by the programmer:
compare pre- and post- user input SQL query structure
it is more effective to measure result of input, not the simple input itself
simple SQL parser to eval all user input w/o call to db to lower overhead cost
aim to satisfy:
-eliminate possibility of attack
-min effort required by programmer
-min runtime overhead
description + J2EE implementation
2. Background
1.web server technology: accept user input on forms via
web pages
passed to server as name-value pairs
(strings)
passed as query string
passed as appended to
URL after '?' name-val&name-val& --user can free edit the
url
web forms have hidden
fields (can be easily detected by attacker)
2.SQL injection
malicious user input parsed as part
of SQL query
3. SQL injection techniques
1. Tuatologies
in the WHERE of SELECT or UPDATE statements ' OR 1=1; --
difficult to
detect, so many possibilities
n = n
'greg' LIKE
'%gr%'
'greg'=N'greg'
(based on LIKE and =N operators)
2. UNION queries
allows
2 queries to be joined: SELECT col1,col2,col3, from tab1 UNION SELECT col4,col5
from tab2
if
attacker knows the number adn types of cols in the first query, additional
query (SELECT body, results FROM reprots) can be appended.
sometimes
easy to guess
sometimes programmer does not consume all error msg, giving info to attacker
match # of
cols and get types
SELECT
* FROM users WHERE userid=22 UNION SELECT body,results,1 FROM reports
Syntax
error converting the varchar value 'txfrs' to a column of
data type int occurs. This
signals that
Then continue to
discover more
Additional STATEMENTS
when using
UNION often only first row of result is used by web page, so getting more rows
isn't useful for UNION attack
UNION 1st
query w/ second one returning no result (can provide disastrous consequences)
lead to xp_cmdshell(string) command which
can be unioned with any other query, so the cmd passed happens on the server
sp_execwebtask(sql,location)
exes query and saves result as webpage given the location location:
SELECT * FROM
users WHERE userid=22 UNION sp_makewebtask '\\IP
address\share\test.html','SELECT * FROM users'
will
create an HTML page of the entire users table at the server IP address.
3. Comments
to
eliminated password= and other ends of the where clause
SELECT * FROM
users WHERE username='greg' AND password='secret'
changed to
SELECT * FROM users WHERE username='admin' -- AND password=".
similarly,
SELECT * FROM users WHERE username='anything' OR 1=1 --AND
password='irrelevant'
gives the
attack access as the first record (often admin acct)
4. Mass SQL injection
can use a
web crawler instead of a webpage (a program to insert illegal chars into the
query string of URL or HTML form)
check for
error msgs. (for example Microsoft Internet Information server will display
ODBC error on unescaped single quote)
3. SQL Parse Tree Validation
-need a grammar of statement's
language
-parse two statements to see if
queries are equivalent
if SQL injection attack =>
parse tree of intended query != parse tree of user input added tree
leaf nodes
has empty leaves intended for user input and can represent only one node in the
resulting query. must be the value of a literal
1. Dynamic Queries
advantage over static methods:
evaluate exact structure of intended SQL query.
ex: search page: static method would
not work, structure of query relies on user input
email
search: WHERE subject LIKE '%input%'
ex: set sorting (tabs can set order
and change it)
easily
w/ ORDER BY col1, col2
can be
present on any # of cols, static analysis can't see query at compile time
Method of this paper establishes query at
runtime, verification is straightforward.
ex: online lib app
programmer :
front end app select pubs based on criteria
3
user input fields
list: author title, or publisher for search
list of ops: LIKE, >, =, etc...
free form input: text input
2 buttons:
append the clause as conjunction or disjunction
LHS col name, RHS user-supplied value using " "
could program in middleware ' AND accesslevel< 20 ' to limit selection
the user
impacts structure of SQL query, but not unknown nodes from user
if
programmer allowed free text from user to be parsed as SQL, then security is
compromised
so here
no 'injection' by def since SQL not parsed as user input text
does not
eliminate tautologies but eliminates SQL injection (common is the tautology 'OR
1=1; --' parsed as SQL
2. Comment Token inclusion
subtle case: user knows tables names
and structure of SQL query, could mimic the query:
greg' AND
password='secret' -- for username and tricky for password field => fig2
vulnerability is progreammer may assume the query parsed properly, then it's
not an injection
by including the comment as a token the
trees are different since first parse tree had no comment subtree
3. Implementation
Java: static class SQLGuard:
provides parsing and string building capabilities.
dynamically generate, using
concatenation, a string representing an SQL statement and incorporating user
input
SQLGuard.init()
give fresh key to avoid discovery of key
user string
s is pre- and post-pended w/ key using SQLGuard.wrap(s)
key 8
randomly generated ascii char seeded by clock val, id thread and app guid)
SQLGuard
private method verify()
removes
the key from beg of query,
uses it to id wrapped user input,
builds
2 parse trees.
unpopulated user tokens tree
structure vs. parsing the string w/nodes filled in w/ user input tree structure
ZQL (publicly available parser in
Java) used
4. correctness
4. case study
...
5. RELATED WORK
SQL injection 2003-2005 (this paper)
increased invention by the research community [1, 9, 8, 19].
industrial community programmers aware to minimize the problem [16, 15, 3, 22,
17].
Offutt and Xu [20] web testing and verification
BUT 75% of web attacks are at the application level
a test of 300 web sites showed 97% were vulnerable to web application
attacks [21, 10].
SQL injection is readily solvable.
CHALLENGE: verifying that the user has not altered the syntax of the query.
MUCH work casts the problem as one of user input validation, and focuses on
analyzing string inputs [2].
Several technologies exist to aid programmers with validating input.
Check for single quotes and dashes, and escape them manually-> easily beaten
[4], Moeller and Schwartzbach use a two-step process where strings = possible
ops in Java
1. employ flow graphs,
2. generate finite automata
representing the set of strings possible.
Several static analysis approaches create models for the potential SQL queries
[9, 8, 23].
Wasserman and Su [23] use the static analysis technique
from [4] to generate finite state automata for modeling the
set of valid SQL commands for each data access
drawbacks: can't handle many queries: w/ LIKE
problem with any static design, can't model dynamically generated queries well
(can't get final structure of the query)
"Finally, the technique to prevent tautology attacks is explicit, and thus
un-
necessarily complicated. It attempts to ascertain (via SQL
tokens) whether a boolean FSA is true. The number of cases
for this analysis is large; it appears they miss simple boolean
cases which do not use mathematical operators."
Halfond and Orso static analysis techniques with dynamic verification [10, 11].
similar to this paper
Huang et al. [12] also both static and dynamic techniques,
but all sensitive functions preconditions precisely defined in
annotations (trust policies).
Boyd and Keremytis [1, 13] developed SQLRand modifies tokens of the SQL
language: a prepended integer.
so SQL from user, such as OR 1=1, do not match augmented SQL tokens
=> error useful and works.
problem: database tier ---INTERFACE---middle tier (not trivial)
problem: new tokens are static. exported SQL errors expose new
tokens.
PHP5: Magic Quotes: escapes input from the Request Object automatically
a server setting, applications written assuming on/off => invalidated input
or double escaped input
removing programmer frustration( by providing an API to check at runtime) not
the best
Strong typing (J2EE has prepared statements, .NET has commands)
but prepared statements effectiveness depends on implementation (db drivers,
3rd party vendors)
also, each parameter cast a priori, set by individual lines of code
also, two db trips required to create model
also, many platforms no support for strong-typed technologies.
"In addition to eliminating SQL injection, we believe that
a viable solution must not inconvenience the programmer.
The main reason for writing queries as dynamically gener-
ated strings (as opposed to prepared statements) appears to
be ease-of-use. Our approach blends the strengths of both:
it has the convenience of dynamically generated strings and
the security of prepared statements."