Software Analytics for Improving Program Comprehension

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Saudi Digital Library
Program comprehension is an essential part of software development and maintenance. Traditional methods of program comprehension, such as reviewing the codebase and documentation, are still challenging for understanding the software's overall structure and implementation. In recent years, software static analysis studies have emerged to facilitate program comprehensions, such as call graphs, which represent the system’s structure and its implementation as a directed graph. Furthermore, some studies focused on semantic enrichment of the software system problems using systematic learning analytics, including machine learning and NLP. While call graphs can enhance the program comprehension process, they still face three main challenges: (1) complex call graphs can become very difficult to understand making call graphs much harder to visualize and interpret by a developer and thus increases the overhead in program comprehension; (2) they are often limited to a single level of granularity, such as function calls; and (3) there is a lack of the interpretation semantics about the graphs. In this dissertation, we propose a novel framework, called CodEx, to facilitate and accelerate program comprehension. CodEx enables top-down and bottom-up analysis of the system's call graph and its execution paths for an enhanced program comprehension experience. Specifically, the proposed framework is designed to cope with the following techniques: multi-level graph abstraction using a coarsening technique, hierarchical clustering to represent the call graph into subgraphs (i.e., multi-levels of granularity), and interactive visual exploration of the graphs at different levels of abstraction. Moreover, we are also worked on building semantics of software systems using NLP and machine learning, including topic modeling, to interpret the meaning of the abstraction levels of the call graph.