M.Sc Thesis

M.Sc StudentTrostanetski Anna
SubjectModular Demand-Driven Analysis of Semantic Difference
for Program Versions
DepartmentDepartment of Computer Science
Supervisor PROFESSOR EMERITUS Orna Grumberg
Full Thesis textFull thesis text - English Version


Programs are often built in stages, a new (patched) program version is built on top of

an old one. If we could understand the semantic difference between two consecutive

program versions, it would be very beneficial for the fast development process of

correct programs. We can use the correctness of the old (and checked) version to infer

the correctness of the new version. Code reviews, security vulnerability checks, and

new feature verification would become easier if the reviewer were to understand the

semantic differences between both versions. In general this problem is undecidable, yet we devise an algorithm for computing over-- and under--approximations of the

semantic (input--output) differences between program versions. We aim at providing

precise enough abstractions for real code, and allowing guidance by the user to reach

good results that match their needs. Since this information is used during the

development process, it may be sufficient (and possibly preferable) to give results for

intermediate procedures, instead of the entire program. We provide a mechanism for

guiding the analysis towards interesting procedures, and the precision of the

approximation is constantly improved by our anytime algorithm.

While our algorithm can work for very different versions of code, it will work better

on syntactically similar versions. Syntactic changes in program versions are often

small and local, and may apply to procedures that are deep in the call graph. Our

approach analyses only those parts of the programs that are affected by the changes.

Moreover, the analysis is modular, processing a single pair of procedures at a time.

Called procedures are not inlined. Rather, their previously computed summaries and

difference summaries are used.

For efficiency, procedure summaries and difference summaries are abstracted using

uninterpreted functions, and may be refined on demand. We show how we can use

common uninterpreted functions to use our knowledge of equivalence when no

precise summery is available. Our algorithm works bottom-up from the locations of

the syntactic changes, towards the main procedure. When the precision of the

abstractions used is not sufficient, we run (top--down) refinement to create new

summaries that are sufficiently precise. The refinement is guided by the context of the

call we analyse.

We define modular symbolic execution and prove its connection to standard

symbolic execution. We use modular symbolic execution to analyse each path in each

procedure at most once, without re-analysing paths in called procedures.

We have compared our method to well established tools and observed speedups of at

least one order of magnitude. Furthermore, in many cases our tool proves

equivalence or finds differences while others fail to do so.