Multi-Document Summarisation from Heterogeneous Software Development Artefacts

Thumbnail Image
Mahfouth Ahmad Alghamdi
Journal Title
Journal ISSN
Volume Title
Saudi Digital Library
Software engineers create a vast number of artefacts during project development; activities, consisting of related information exchanged between developers. Sifting a large amount of information available within a project repository can be time-consuming. In this dissertation, we proposed a method for multi-document summarisation from heterogeneous software development artefacts to help software developers by automatically generating summaries to help them target their information needs. To achieve this aim, we first had our gold-standard summaries created; we then characterised them, and used them to identify the main types of software artefacts that describe developers’ activities in GitHub project repositories. This initial step was important for the present study, as we had no prior knowledge about the types of artefacts linked to developers’ activities that could be used as sources of input for our proposed multi-document summarisation techniques. In addition, we used the gold-standard summaries later to evaluate the quality of our summarisation techniques. We then developed extractive-based multi- document summarisation approaches to automatically summarise software development artefacts within a given time frame by integrating techniques from natural language processing, software repository mining, and data-driven search-based software engineering. The generated summaries were then evaluated in a user study to investigate whether experts considered that the generated summaries mentioned every important project activity that appeared in the gold-standard summaries. The results of the user study showed that generating summaries from different kinds of software artefacts is possible, and the generated summaries are useful in describing a project’s development activities over a given time frame. Finally, we investigated the potential of using source code comments for summarisation by assessing the documented information of Java primitive variables in comments against three types of knowledge. Results showed that the source code comments did contain additional information and could be useful for summarisation of developers’ development activities.