7July_09_2009 Using parse tree Validation to prevent SQL injection attacks
http://portal.acm.org/citation.cfm?doid=1108473.1108496

Technique is based on run-time comparison of parse-tree of the SQL statement before user input added and after input
adds 3 ms overhead (like SQLCHECK) to db query costs
implemented in simple static java class

1. web apps all over J2EE, Microsoft's ASP and ASP.NET, PHP, cgi.
   TIERS
     Presentation tier (web browser for i/o)
     middle tier (business tier w/ business logic, requests to db layer)
     data tier (typically rdb, storage services)

  user logins in webbrowswer, middle tier creates sql, db retrieves info, middle tier creates session to send back to webbrowser (presentation tier)
SQL injection teching to break in. user input is parsed as SQL tokens, changing the semantics of underlying query

this paper: novel runtime technique to eliminate SQL injection

all SQL injections alter the structure of the query intended by the programmer:
compare pre- and post- user input SQL query structure
it is more effective to measure result of input, not the simple input itself
simple SQL parser to eval all user input w/o call to db to lower overhead cost
aim to satisfy:
  -eliminate possibility of attack
  -min effort required by programmer
  -min runtime overhead
description + J2EE implementation

2. Background
    1.web server technology: accept user input on forms via web pages
       passed to server as name-value pairs (strings)
       passed as query string
          passed as appended to URL after '?'  name-val&name-val&  --user can free edit the url
          web forms have hidden fields (can be easily detected by attacker)
    2.SQL injection
         malicious user input parsed as part of SQL query
    3. SQL injection techniques
        1. Tuatologies
              in the WHERE of SELECT or UPDATE statements  ' OR 1=1; --
            difficult to detect, so many possibilities
            n = n 
            'greg' LIKE '%gr%'
            'greg'=N'greg'             (based on LIKE and =N operators)
        2. UNION queries
             allows 2 queries to be joined: SELECT col1,col2,col3, from tab1 UNION SELECT col4,col5 from tab2
             if attacker knows the number adn types of cols in the first query, additional query (SELECT body, results FROM reprots) can be appended.
             sometimes easy to guess
              sometimes programmer does not consume all error msg, giving info to attacker
            match # of cols and get types
            SELECT * FROM users WHERE userid=22 UNION SELECT body,results,1 FROM reports
             
Syntax error converting the varchar value 'txfrs' to a column of data type int occurs. This signals that
           Then continue to discover more
        Additional STATEMENTS
            when using UNION often only first row of result is used by web page, so getting more rows isn't useful for UNION attack
            UNION 1st query w/ second one returning no result (can provide disastrous consequences)
            lead to xp_cmdshell(string) command which can be unioned with any other query, so the cmd passed happens on the server
            sp_execwebtask(sql,location)  exes query and saves result as webpage given the location location:
          
SELECT * FROM users WHERE userid=22 UNION sp_makewebtask '\\IP address\share\test.html','SELECT * FROM users'
            will create an HTML page of the entire users table at the server IP address.
        3. Comments
            to eliminated password= and other ends of the where clause
        
SELECT * FROM users WHERE username='greg' AND password='secret'
            changed to
             
SELECT * FROM users WHERE username='admin' -- AND password=".
            similarly,
            
SELECT * FROM users WHERE username='anything' OR 1=1 --AND password='irrelevant'
            gives the attack access as the first record (often admin acct)
        4. Mass SQL injection
            can use a web crawler instead of a webpage (a program to insert illegal chars into the query string of URL or HTML form)
            check for error msgs. (for example Microsoft Internet Information server will display ODBC error on unescaped single quote)
   3. SQL Parse Tree Validation
        -need a grammar of statement's language
        -parse two statements to see if queries are equivalent
        if SQL injection attack  => parse tree of intended query != parse tree of user input added tree
            leaf nodes has empty leaves intended for user input and can represent only one node in the resulting query. must be the value of a literal
            



 


 


 


        


    1. Dynamic Queries
        advantage over static  methods: evaluate exact structure of intended SQL query.   
        ex: search page: static method would not work, structure of query relies on user input
            email search: WHERE subject LIKE '%input%'
        ex: set sorting (tabs can set order and change it)
             easily w/ ORDER BY col1, col2
            can be present on any # of cols, static analysis can't see query at compile time
       Method of this paper establishes query at runtime, verification is straightforward.
       ex: online lib app
            programmer : front end app select pubs based on criteria
                    3 user input fields
                        list: author title, or publisher for search
                        list of ops: LIKE, >, =, etc...
                        free form input: text input
                    2 buttons:
                        append the clause as conjunction or disjunction
                        LHS col name, RHS user-supplied value using " "
                    could program in middleware ' AND accesslevel< 20 ' to limit selection
            the user impacts structure of SQL query, but not unknown nodes from user
            if programmer allowed free text from user to be parsed as SQL, then security is compromised
            so here no 'injection' by def since SQL not parsed as user input text
            does not eliminate tautologies but eliminates SQL injection (common is the tautology 'OR 1=1; --' parsed as SQL
                   
    2. Comment Token inclusion
        subtle case: user knows tables names and structure of SQL query, could mimic the query:
            greg' AND password='secret' -- for username and tricky for password field => fig2
            vulnerability is progreammer may assume the query parsed properly, then it's not an injection
       by including the comment as a token the trees are different since first parse tree had no comment subtree

    3. Implementation
        Java: static class SQLGuard: provides parsing and string building capabilities.
        dynamically generate, using concatenation, a string representing an SQL statement and incorporating user input
           SQLGuard.init() give fresh key to avoid discovery of key
            user string s is pre- and post-pended w/ key using SQLGuard.wrap(s) 
            key 8 randomly generated ascii char seeded by clock val, id thread and app guid)
        SQLGuard  private method verify()
                removes the key from beg  of query,
                uses it to id wrapped user input,
                builds 2 parse trees.
        unpopulated user tokens tree structure vs.  parsing the string w/nodes filled in w/ user input tree structure
        ZQL (publicly available parser in Java) used
    4. correctness

4. case study

...

5. RELATED WORK
SQL injection 2003-2005 (this paper)
increased invention by the research community [1, 9, 8, 19].
industrial community programmers aware to minimize the problem [16, 15, 3, 22, 17].
 Offutt and Xu [20] web testing and verification
 BUT 75% of web attacks are at the application level
 a test of 300 web sites showed 97% were vulnerable to web application attacks [21, 10].
 SQL injection is readily solvable.

CHALLENGE: verifying that the user has not altered the syntax of the query.
MUCH work casts the problem as one of user input validation, and focuses on analyzing string inputs [2].
Several technologies exist to aid programmers with validating input.
Check for single quotes and dashes, and escape them manually-> easily beaten
[4], Moeller and Schwartzbach use a two-step process where strings = possible ops in Java
        1. employ flow graphs,
        2. generate finite automata representing the set of strings possible.
Several static analysis approaches create models for the potential SQL queries [9, 8, 23].

Wasserman and Su [23] use the static analysis technique
from [4] to generate finite state automata for modeling the
set of valid SQL commands for each data access
drawbacks: can't handle many queries: w/ LIKE
problem with any static design, can't model dynamically generated queries well (can't get final structure of the query)
"Finally, the technique to prevent tautology attacks is explicit, and thus un-
necessarily complicated. It attempts to ascertain (via SQL
tokens) whether a boolean FSA is true. The number of cases
for this analysis is large; it appears they miss simple boolean
cases which do not use mathematical operators."

Halfond and Orso static analysis techniques with dynamic verification [10, 11].
  similar to this paper
Huang et al. [12] also both static and dynamic techniques,
  but all sensitive functions preconditions precisely defined in annotations (trust policies).
Boyd and Keremytis [1, 13] developed SQLRand modifies tokens of the SQL language: a prepended integer.
  so SQL from user, such as OR 1=1, do not match augmented SQL tokens => error useful and works.
  problem:   database tier ---INTERFACE---middle tier (not trivial)
  problem:  new tokens are static. exported SQL errors expose new tokens.
PHP5: Magic Quotes: escapes input from the Request Object automatically
a server setting, applications written assuming on/off => invalidated input or double escaped input
removing programmer frustration( by providing an API to check at runtime) not the best
Strong typing (J2EE has prepared statements, .NET has commands)
but prepared statements effectiveness depends on implementation (db drivers, 3rd party vendors)
also,  each parameter cast a priori, set by individual lines of code
also, two db trips required to create model
also, many platforms no support for strong-typed technologies.

"In addition to eliminating SQL injection, we believe that
a viable solution must not inconvenience the programmer.
The main reason for writing queries as dynamically gener-
ated strings (as opposed to prepared statements) appears to
be ease-of-use. Our approach blends the strengths of both:
it has the convenience of dynamically generated strings and
the security of prepared statements."