|M.Sc Student||Trostanetski Anna|
|Subject||Modular Demand-Driven Analysis of Semantic Difference|
for Program Versions
|Department||Department of Computer Science||Supervisor||Professor Emeritus Orna Grumberg|
|Full Thesis text|
Programs are often built in stages, a new (patched) program version is built on top of
an old one. If we could understand the semantic difference between two consecutive
program versions, it would be very beneficial for the fast development process of
correct programs. We can use the correctness of the old (and checked) version to infer
the correctness of the new version. Code reviews, security vulnerability checks, and
new feature verification would become easier if the reviewer were to understand the
semantic differences between both versions. In general this problem is undecidable, yet we devise an algorithm for computing over-- and under--approximations of the
semantic (input--output) differences between program versions. We aim at providing
precise enough abstractions for real code, and allowing guidance by the user to reach
good results that match their needs. Since this information is used during the
development process, it may be sufficient (and possibly preferable) to give results for
intermediate procedures, instead of the entire program. We provide a mechanism for
guiding the analysis towards interesting procedures, and the precision of the
approximation is constantly improved by our anytime algorithm.
While our algorithm can work for very different versions of code, it will work better
on syntactically similar versions. Syntactic changes in program versions are often
small and local, and may apply to procedures that are deep in the call graph. Our
approach analyses only those parts of the programs that are affected by the changes.
Moreover, the analysis is modular, processing a single pair of procedures at a time.
Called procedures are not inlined. Rather, their previously computed summaries and
difference summaries are used.
For efficiency, procedure summaries and difference summaries are abstracted using
uninterpreted functions, and may be refined on demand. We show how we can use
common uninterpreted functions to use our knowledge of equivalence when no
precise summery is available. Our algorithm works bottom-up from the locations of
the syntactic changes, towards the main procedure. When the precision of the
abstractions used is not sufficient, we run (top--down) refinement to create new
summaries that are sufficiently precise. The refinement is guided by the context of the
call we analyse.
We define modular symbolic execution and prove its connection to standard
symbolic execution. We use modular symbolic execution to analyse each path in each
procedure at most once, without re-analysing paths in called procedures.
We have compared our method to well established tools and observed speedups of at
least one order of magnitude. Furthermore, in many cases our tool proves
equivalence or finds differences while others fail to do so.