EXE: Automatically generating inputs of death
From Practical Software Verification
Methods used: Symbolic analysis, constraint gathering/solving, constraint formula optimisation
Other tools used: STP for constraint solving
Working implementation: Yes
Open source/public demo: Neither
Abstract
This paper presents EXE, an effective bug-finding tool that automatically generates inputs that crash real code. Instead of running code on manually or randomly constructed input, EXE runs it on symbolic input initially allowed to be "anything." As checked code runs, EXE tracks the constraints on each symbolic (i.e., input-derived) memory location. If a statement uses a symbolic value, EXE does not run it, but instead adds it as an input-constraint; all other statements run as usual. If code conditionally checks a symbolic expression, EXE forks execution, constraining the expression to be true on the true branch and false on the other. Because EXE reasons about all possible values on a path, it has much more power than a traditional runtime tool: (1) it can force execution down any feasible program path and (2) at dangerous operations (e.g., a pointer dereference), it detects if the current path constraints allow any value that causes a bug.When a path terminates or hits a bug, EXE automatically generates a test case by solving the current path constraints to find concrete values using its own co-designed constraint solver, STP. Because EXE's constraints have no approximations, feeding this concrete input to an uninstrumented version of the checked code will cause it to follow the same path and hit the same bug (assuming deterministic code).EXE works well on real code, finding bugs along with inputs that trigger them in: the BSD and Linux packet filter implementations, the udhcpd DHCP server, the pcre regular expression library, and three Linux file systems.
Main points
- Uses STP as its decision procedure
- Requires the program to be recompiled with exe-cc and symbolic data marked using make_symbolic()
- Uses a program to generate its own test cases by running it on symbolic input that is initially unconstrained. If an instruction tries to operate on symbolic data then that operation is replaced with an input constraint.
- At each conditional a new process is forked and a different constraint created for each branch. If the addition of this branch constrain makes the path impossible it is dropped
- Optimisations include constraint caching and constraint independence determination
- Best first search based on least executed code used to determine paths to examine
Bibtex
@inproceedings{cadar06exe,
address = {New York, NY, USA},
author = {Cadar, Cristian and Ganesh, Vijay and Pawlowski, Peter M. and Dill, David L. and Engler, Dawson R. },
booktitle = {CCS '06: Proceedings of the 13th ACM conference on Computer and communications security},
doi = {http://dx.doi.org/10.1145/1180405.1180445},
isbn = {1595935185},
keywords = {ai, da, project-planning, test-data-generation},
pages = {322--335},
posted-at = {2008-08-06 17:46:16},
priority = {2},
publisher = {ACM Press},
title = {EXE: automatically generating inputs of death},
url = {http://dx.doi.org/10.1145/1180405.1180445},
year = {2006}
}

