MULTIDIMENSIONAL APPROACHES IN BUG DETECTION FOR PARALLEL PROGRAMMING AND TEXT-TO-CODE SEMANTIC PARSING

Alsofyani, May

MULTIDIMENSIONAL APPROACHES IN BUG DETECTION FOR PARALLEL PROGRAMMING AND TEXT-TO-CODE SEMANTIC PARSING

Files

SACM-Dissertation.pdf (1.14 MB)

Date

2025

Authors

Alsofyani, May

Publisher

University of Central Florida

Abstract

This dissertation applies deep learning and large language models to two domains: parallel programming fault detection and text-to-code translation, aiming to enhance software reliability and natural language-driven code generation. Due to their unpredictable nature, concurrency bugs-particularly data race bugs— present significant challenges in fault detection for parallel programming. We investigate deep learning and LLM-based approaches for detecting data race bugs in OpenMP programs. Our proposed methods include a transformer encoder and GPT-4 through prompt engineering and fine-tuning. Experimental results demonstrate that the transformer encoder achieves competitive accuracy compared to LLMs, highlighting its effectiveness in understanding complex OpenMP directives. Expanding this research, we explore the role of LLMs in detecting faults in Pthreads, which requires a deep understanding of thread-based logic and synchronization mechanisms. We analyze ChatGPT's effectiveness in Pthreads fault detection through dialogue-based interactions and advanced prompt engineering techniques, including Zero-Shot, Few-Shot, Chain-of-Thought, and Retrieval-Augmented Generation. Additionally, we introduce three hybrid prompting techniques—Chain-of-Thought with Few-Shot Prompting, Retrieval-Augmented Generation with Few-Shot Prompting, and Prompt Chaining with Few-Shot Prompting—to enhance fault detection performance. In the semantic parsing domain, our research bridges the gap between natural language and executable code, focusing on text-to-SQL translation. To address SQL's limitations in statistical analysis, we introduce SIGMA, a dataset for text-to-code semantic parsing with statistical analysis capabilities. In addition, we address the gap in cross-domain context-dependent text-to-SQL translation for the Arabic language. While prior research has focused on English and Chinese datasets, no efforts have been made to explore Arabic cross-domain conversational querying. We introduce Ar-SParC, the first Arabic cross-domain, context-dependent text-to-SQL dataset. This dissertation contributes to fault detection in parallel programming and semantic parsing with statistical analysis, leveraging cutting-edge deep learning and LLMs techniques. Our findings advance bug detection in high-performance computing and natural language-based code generation, significantly improving software reliability and accessibility.

Keywords

Parallel Programming, Pthreads, OpenMP, Data Races, Text-to-Code, Semantic Parsing, Large Language Models (LLMs), Program Repair, Fault Localization, Deep Learning

URI

https://hdl.handle.net/20.500.14154/75509

Collections

SACM - United States of America

Full item page

MULTIDIMENSIONAL APPROACHES IN BUG DETECTION FOR PARALLEL PROGRAMMING AND TEXT-TO-CODE SEMANTIC PARSING

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By