Detecting Flaky Tests Without Rerunning Tests

Alshammari, Abdulrahman Turqi

Detecting Flaky Tests Without Rerunning Tests

dc.contributor.advisor	Lam, Wing
dc.contributor.advisor	Ammann, Paul
dc.contributor.author	Alshammari, Abdulrahman Turqi
dc.date.accessioned	2024-08-06T09:28:06Z
dc.date.available	2024-08-06T09:28:06Z
dc.date.issued	2024-07-26
dc.description.abstract	A critical component of modern software development practices, particularly continuous integration (CI), is the halt of development activities in response to test failures which requires further investigation and debugging. As software changes, regression testing becomes vital to verify that new code does not affect existing functionality. However, this process is often delayed by the presence of flaky tests—those that yield inconsistent results on the same codebase, alternating between pass and fail. Test flakiness introduces challenges to the trust in testing outcomes and undermines the reliability of the CI process. The typical approach to identifying flaky tests has involved executing them multiple times; if a test yields both passing and failing results without any modifications to the codebase, it is flaky, as discussed by Luo et al. in their empirical study. This approach, while straightforward, can be resource-intensive and time-consuming, resulting in considerable overhead costs for development teams. Moreover, this technique might not consistently reveal flakiness in tests that exhibit varied behavior across varying execution environments. Given these challenges, the research community has been actively seeking more efficient and reliable alternatives to the repetitive execution of tests for flakiness detection. These explorations aim to uncover methods that can accurately detect flaky tests without the need for multiple reruns, thereby reducing the time and resources required for testing. This dissertation addresses three principal dimensions of test flakiness. First, it presents a machine learning classifier designed to detect which tests are flaky, based on previously detected flaky tests. Second, the dissertation proposes three de-duplication-based approaches to assist developers in determining whether a flaky test failure is flaky or not. Third, it highlights the impact of test flakiness on other testing activities (particularly mutation testing) and discusses how to mitigate the effects of test flakiness on mutation testing. This dissertation explores the detection of test flakiness by conducting an empirical study on the limitations of rerunning tests as a method for identifying flaky tests, which results in a large dataset of flaky tests. This dataset is then utilized to develop FlakeFlagger, a machine learning classifier, which is designed to automatically predict the likelihood of a test being flaky through static and dynamic analysis. The objective is to leverage FlakeFlagger to identify flaky tests without the need for reruns by detecting patterns and symptoms common among previously identified flaky tests. In addressing the challenge of detecting whether a failure is due to flakiness, this dissertation demonstrates how developers can better manage flaky tests within their test suites. The dissertation proposes three deduplication-based methods to help developers determine whether a specific failure is genuinely flaky or not. Furthermore, the dissertation discusses the effects of test flakiness on mutation testing, a critical activity for assessing the quality of test suites. It includes an extensive rerun experiment on the mutation analysis of flaky tests identified earlier in the study. This is to highlight the significant impact of flaky tests on the validity of the mutation testing.
dc.format.extent	112
dc.identifier.uri	https://hdl.handle.net/20.500.14154/72785
dc.language.iso	en_US
dc.publisher	George Mason University
dc.subject	Software Testing
dc.subject	Flaky Tests
dc.subject	Machine Learning
dc.subject	Data Science
dc.subject	Flaky Failures
dc.title	Detecting Flaky Tests Without Rerunning Tests
dc.type	Thesis
sdl.degree.department	Computer Science
sdl.degree.discipline	Software Testing
sdl.degree.grantor	George Mason
sdl.degree.name	Dector of Philoshophy

Collections

SACM - United States of America

Detecting Flaky Tests Without Rerunning Tests

Files

Collections