|Ph.D Student||Lalouche Gal|
|Subject||The Consistency, Independence and Validity of Software|
|Department||Department of Computer Science||Supervisor||Professor Joseph Gil|
|Full Thesis text|
Static code metrics have long been researched with the hope of gaining some insight into the external features of a code module, such as maintainability or error-proneness. Since metrics, unlike features, are easily extracted from the static code, finding a link between the two would be a powerful tool for software development. To that end, many different metrics have been defined and studied, both for imperative and object-oriented languages. This PhD thesis accumulates three empirical results in the study of software metrics.
Firstly, I discuss a familiar phenomenon: code metrics such as Lines of Code are extremely context dependent, i.e., their distribution differs from project to project. I apply visual inspection and statistical reasoning and testing to show that metric values are so sensitive to context bias that their measurement in one corpus offers little prediction regarding their measurement in another.
On the positive side, I demonstrate how context bias can be neutralized for the majority of metrics considered.
Secondly, I ask if the validity of any code metric, i.e., its ability to predict external features, is the result of its own intrinsic quality, or simply a by-product of its correlation to the size of the code module. I define three external features, and measure their correlation to metric values. I show that the more a metric is correlated to size, the higher its prediction ability and vice versa. This is demonstrated for the basic counting metrics, the traditional McCabe’s Cyclomatic Complexity metric, the object-oriented C&K metric suite, and even completely new metrics based on different theoretical bases. Various attempts at negating the effects of size are used, but none are successful; metrics are only as valid as their correlation to size.
Lastly, I show that size is not merely the only valid metric, but also the only reliable one. Using Principal Component Analysis to extract the main components from my metric suite, I present evidence that other than size, no other feature is consistent, even after performing the normalization discussed above. This result applies both when consistency is measured in between different projects, and, somewhat surprisingly, also when different versions of the same project are used for comparison.