7July_13_2008
Preventing Injection Attack with Syntax Embeddings
http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2007-003.pdf
use of syntax embedding to prevent injection vulnerabilities in
lang-independent ways
One lang that constructs setnences in aother lang: SQL, XQuery, Xpath, XML or
shell cmds
usually done using unhygienic string manipulation
injection attacks (largest classes of security problems)
SQL-construction is more likely vulnerable than not
often in host language that dynamically computers SQL queries (like PHP)
CGI scripts call unix shell cmds
unhygienically constructed HTML
cross-site scripting (XSS)
malicious javascript code.
-Injections prevented by escaping external input.... but this can still be
injected
can escape the escape and much has been done w/ escape char
-better sol'n (in this paper) is to us API to build the sentence.
can ensure that injections are impossible by construction
sting literals can take care of escaping via code.
type system ensures well-formedness of the sentence
(this is unnattractive due to gap bet. programmer and syntax of guest
language, which is a
domain-specific languages (DSL) )
this paper: comibning security of using API w/ string manipulation via
embedding the syntax of the guest lang into syntax of host lang. (pioneered by
meta-programming)
ex: SQL-in-Java
preprocessor (assimilator) translates code in combined lang into Java code that
calls API generated from guest lang grammar.
embedding is not new idea (SQL-92 standard into C programming)
contributions:
-comprehensive sol'n to injection attacks via construction
-generic enough to be easily adapted to new host and guest langs
-generic via language embedding (modular, scannerless parsing)
other through generating underlying APIs form context-free grammar of gues
lang. assimilator translatees guest lang to API can be applied to any host lang
and combos of guest langs (NO meta-programming required)
-well-formedness of guest lang. sentences that are constructed can be ensured
at run-time (as well as statically)
-ambigiuities are dealt with instead of the having the programmer disambiguate
such things as antiquotations
prototype: StringBorg (after MetaBorg) http://www.stringborg.org/
core prob w/ underlying injection attacks -- query is parsed after construction
that does not correspond to intended grammatical structure. structures not
easily compared (accd to this author--but look at parse trees)
StringBorg handles this as preprocessing step, then constructs code
overview: syntax of guest lang is embedded in host lang, combined syntax to
write programs, assimilator parse source file and trans forms embedded code to
invocations of API (using API generator)
thus preventing SQL injection attacks by ALWAYS checking lexcial values
-this method is language independent
discussion:
Static typechecking two major disadvantages
(1) the programmer has to know all these syntactical categories and their
mapping to types of the host language and
(2) no ambiguities are allowed, which makes the syntax embedding more
difficult to use.
advantage is that static checking provides more static guarantees, not a
security advantage.
"both the statically and dynamically typed back-ends guarantee statically
that an injection attack cannot occur.
The dynamic or static typechecking only checks for programming errors, not for
problems with input provided by the user.
The generated APIs will never throw an ‘injection attack exception’; the
exceptions that can occur are either related to illegal characters in the input
(e.g. the newline in SQL) or are programming errors. The last category of
exceptions does not depend on particular inputs, but only on execution paths,
which are easier to detect using testing."
Prevented classes of
injection attacks
attacks classified by injection mechanism or intent of the attack.
FROM paper directly----
• Injection through
user input is the mechanism of using specially crafted user input to
construct a query
that has a different parse tree then originally intended. StringBorg prevents
these attacks by checking the
syntax of lexical values and automatic escaping of all strings.
• Injection through
cookies differs from injection through user input by exploiting input from
cookies,
which are sometimes naively assumed to be controlled by a web application.
StringBorg checks and
escapes all strings, irrespective of their origin, thus disabling this
injection mechanism.
• Injection through
server variables, such as HTTP headers, employs yet another origin of
strings to
perform an attack. Again, these attacks are prevented since StringBorg escapes
all strings.
• Second-order
injection attacks indirectly perform the attack by first introducing a
malicious input in the
system (e.g. database), which is used later as the input of an affected query.
Again, these attacks are
prevented since StringBorg checks and escapes all strings, whether they
originate directly from the user
or not.
• Tautology-based
attacks use an injection mechanism to craft a query where the condition
always evaluates
to true. StringBorg prevents the mechanisms of injection attacks from being
applied, which implies that
crafting tautologies is impossible.
• Union query attacks
are related to tautologies, but allow access to different tables than the ones
originally
involved in the query. Similar to tautology attacks, StringBorg prevents the
mechanisms that are used.
• Piggy-backed queries
are malicious queries added to be executed in addition to the original query.
Again,
StringBorg prevents the mechanisms that are used.
• Illegal query
attacks are used to trigger syntax, type or logical errors. This often
results in an error report
that reveals information about possible exploits. StringBorg only throws an
exception if an input string
contains invalid characters that could not be escaped. StringBorg disables the
construction of syntactically
invalid queries.
NOTE:"An embedding that allows conversion of input
strings to table and column names
(which is not the case in our embeddings). It is
advisable to disallow this conversion and only allow
literal table and column names. In general, allowing
users to input identifiers can introduce a plenitude of
options for manipulating the intended semantics of the
constructed guest sentence."
• Inference attacks are
related to illegal query attacks. They can be applied if a site is protected
not to show
error messages. By observing the success or failure of queries, the setup of
the database can indirectly
still be examined. The prevention of inference attacks does not differ from
illegal query attacks.
• Stored procedure
attacks are a class of all known attacks applied to stored procedures. If
stored procedures
compose queries based on user input, then the same method for structured
construction should be
applied.
• Alternate encoding
attacks avoid detection and prevention of an attack by concealing the
actual query
in a different syntax or character encoding, which tricks the detection and
prevention techniques into
interpreting the query in a different way then the actual processor of the
guest language does. In all
known embeddings, StringBorg prevents encoding attacks since the encoding
itself is escaped and lexical
strings are checked syntactically.
---END FROM paper directly
NOT GUARANTEED to prevent for all guest languages (unicode escaping in java for
any input char, not just string literals)
So if java is used, then that java's unicode escapes can be used to to
terminate a string literal and inject. (not caught by lexical checking)
DFA does not unescape Unicode escape
sol'n's
1. escape sequence can be escaped
2. unescaping rules defined next to escape rules
3. syntax definition of guest lang restricted not to support unicode escape
sequences
4. syntax def formalism extended to lexical escape
use of unexpected char encodings--hide an attack.
StringBorg
--relies heavily on modular syntax def and parser generation, by SDF and
scannerless Generalized-LR parsing
need syntax of host and guest expressible in a context-free grammar (not all
langs are!)
--error reporting quality of error msgs important...
--efficient parser composition need every combo parser.... parser generation
too expensive as part of compilation... done separately, lacks
"plug-in" future: parse table plug-ins
RELATED WORK
this work does not alleviate the need for static or dynamic analysis techniques
as they apply to existing code that is more traditional
Explicit escaping and
filtering (escape input and filter malicious inputs) ... requires
programmers to get it right each time
APIs SQL DOM sefe
SQK w/ query construction behind an API that ensures string literals are
escaped via construction: SQL abstract trees generated from a specific db
schema. ensures typing of queries wrt db.
SQL Query Objects: quires defined in plain Java, compiled using OpenJava in JDO
calls (can be seen as embedding convenient sytnax for queries, into a host
lang, assimilation is translation into JDO.)
LINQ syntactic
hygiene (Haskell Server Pages) Cw and LINQ, provide XML literals, enable XML
output host expression converted to an expression tree than can be processed in
arbitrary ways. (not extensible to other host langs)
Static analysis
techniques JDBC Checker, 'tainted data' , etc....
Run-time detection
techniques AMNESIA automaton for query strings, WASP 'trusted'
string, SQLCHECK wrappers with markers
they discover but don't recover... so dos attack still possible (they just shut
down the server or something? )
SQL-specific
techniques SQL-92 has embedding for specific langs,
prepared statements allow safe construction
stored procedures (as long as it is called in a safe way)
MetaBorg provides
an embedded domain-specific syntax for using libraries.
scannerless Generalized-LR alg to parse embedded domain-specific lang and
Stratego prog transformation lang for assimilation of embedded code to host
lang. http://www.program-transformation.org/Stratego/MetaBorg
CONCLUSION
"The main advantage over previous approaches is that it
makes injections impossible by construction, and that it is generic—it is not
necessary to produce APIs and
assimilators for each element of the cross-product of host and guest languages
{Java, C#, PHP, Perl, . . .}×
{SQL, JDOQL, HQL, EJBQL, OQL, XML, HTML, XPath, XQuery, Shell, . . .}, but only
to perform a
relatively small amount of work for each host and guest language."