MULTIDIMENSIONAL APPROACHES IN BUG DETECTION FOR PARALLEL PROGRAMMING AND TEXT-TO-CODE SEMANTIC PARSING

dc.contributor.advisorWang Liqiang
dc.contributor.authorAlsofyani, May
dc.date.accessioned2025-06-03T15:24:57Z
dc.date.issued2025
dc.description.abstractThis dissertation applies deep learning and large language models to two domains: parallel programming fault detection and text-to-code translation, aiming to enhance software reliability and natural language-driven code generation. Due to their unpredictable nature, concurrency bugs-particularly data race bugs— present significant challenges in fault detection for parallel programming. We investigate deep learning and LLM-based approaches for detecting data race bugs in OpenMP programs. Our proposed methods include a transformer encoder and GPT-4 through prompt engineering and fine-tuning. Experimental results demonstrate that the transformer encoder achieves competitive accuracy compared to LLMs, highlighting its effectiveness in understanding complex OpenMP directives. Expanding this research, we explore the role of LLMs in detecting faults in Pthreads, which requires a deep understanding of thread-based logic and synchronization mechanisms. We analyze ChatGPT's effectiveness in Pthreads fault detection through dialogue-based interactions and advanced prompt engineering techniques, including Zero-Shot, Few-Shot, Chain-of-Thought, and Retrieval-Augmented Generation. Additionally, we introduce three hybrid prompting techniques—Chain-of-Thought with Few-Shot Prompting, Retrieval-Augmented Generation with Few-Shot Prompting, and Prompt Chaining with Few-Shot Prompting—to enhance fault detection performance. In the semantic parsing domain, our research bridges the gap between natural language and executable code, focusing on text-to-SQL translation. To address SQL's limitations in statistical analysis, we introduce SIGMA, a dataset for text-to-code semantic parsing with statistical analysis capabilities. In addition, we address the gap in cross-domain context-dependent text-to-SQL translation for the Arabic language. While prior research has focused on English and Chinese datasets, no efforts have been made to explore Arabic cross-domain conversational querying. We introduce Ar-SParC, the first Arabic cross-domain, context-dependent text-to-SQL dataset. This dissertation contributes to fault detection in parallel programming and semantic parsing with statistical analysis, leveraging cutting-edge deep learning and LLMs techniques. Our findings advance bug detection in high-performance computing and natural language-based code generation, significantly improving software reliability and accessibility.
dc.format.extent129
dc.identifier.urihttps://hdl.handle.net/20.500.14154/75509
dc.language.isoen
dc.publisherUniversity of Central Florida
dc.subjectParallel Programming
dc.subjectPthreads
dc.subjectOpenMP
dc.subjectData Races
dc.subjectText-to-Code
dc.subjectSemantic Parsing
dc.subjectLarge Language Models (LLMs)
dc.subjectProgram Repair
dc.subjectFault Localization
dc.subjectDeep Learning
dc.titleMULTIDIMENSIONAL APPROACHES IN BUG DETECTION FOR PARALLEL PROGRAMMING AND TEXT-TO-CODE SEMANTIC PARSING
dc.typeThesis
sdl.degree.departmentDepartment of Computer Science
sdl.degree.disciplineComputer Science, AI, Deep Lreaning
sdl.degree.grantorUniversity of Central Florida
sdl.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
1.14 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025