SACM - United States of America
Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9668
Browse
2 results
Search Results
Item Restricted MULTIDIMENSIONAL APPROACHES IN BUG DETECTION FOR PARALLEL PROGRAMMING AND TEXT-TO-CODE SEMANTIC PARSING(University of Central Florida, 2025) Alsofyani, May; Wang LiqiangThis dissertation applies deep learning and large language models to two domains: parallel programming fault detection and text-to-code translation, aiming to enhance software reliability and natural language-driven code generation. Due to their unpredictable nature, concurrency bugs-particularly data race bugs— present significant challenges in fault detection for parallel programming. We investigate deep learning and LLM-based approaches for detecting data race bugs in OpenMP programs. Our proposed methods include a transformer encoder and GPT-4 through prompt engineering and fine-tuning. Experimental results demonstrate that the transformer encoder achieves competitive accuracy compared to LLMs, highlighting its effectiveness in understanding complex OpenMP directives. Expanding this research, we explore the role of LLMs in detecting faults in Pthreads, which requires a deep understanding of thread-based logic and synchronization mechanisms. We analyze ChatGPT's effectiveness in Pthreads fault detection through dialogue-based interactions and advanced prompt engineering techniques, including Zero-Shot, Few-Shot, Chain-of-Thought, and Retrieval-Augmented Generation. Additionally, we introduce three hybrid prompting techniques—Chain-of-Thought with Few-Shot Prompting, Retrieval-Augmented Generation with Few-Shot Prompting, and Prompt Chaining with Few-Shot Prompting—to enhance fault detection performance. In the semantic parsing domain, our research bridges the gap between natural language and executable code, focusing on text-to-SQL translation. To address SQL's limitations in statistical analysis, we introduce SIGMA, a dataset for text-to-code semantic parsing with statistical analysis capabilities. In addition, we address the gap in cross-domain context-dependent text-to-SQL translation for the Arabic language. While prior research has focused on English and Chinese datasets, no efforts have been made to explore Arabic cross-domain conversational querying. We introduce Ar-SParC, the first Arabic cross-domain, context-dependent text-to-SQL dataset. This dissertation contributes to fault detection in parallel programming and semantic parsing with statistical analysis, leveraging cutting-edge deep learning and LLMs techniques. Our findings advance bug detection in high-performance computing and natural language-based code generation, significantly improving software reliability and accessibility.30 0Item Restricted TOWARDS ROBUST AND ACCURATE TEXT-TO-CODE GENERATION(University of Central Florida, 2024) almohaimeed, saleh; Wang, LiqiangDatabases play a vital role in today’s digital landscape, enabling effective data storage, manage- ment, and retrieval for businesses and other organizations. However, interacting with databases often requires knowledge of query (e.g., SQL) and analysis, which can be a barrier for many users. In natural language processing, the text-to-code task, which converts natural language text into query and analysis code, bridges this gap by allowing users to access and manipulate data using everyday language. This dissertation investigates different challenges in text-to-code (including text-to-SQL as a subtask), with a focus on four primary contributions to the field. As a solution to the lack of statistical analysis in current text-to-code tasks, we introduce SIGMA, a text-to- Code dataset with statistical analysis, featuring 6000 questions with Python code labels. Baseline models show promising results, indicating that our new task can support both statistical analysis and SQL queries simultaneously. Second, we present Ar-Spider, the first Arabic cross-domain text-to-SQL dataset that addresses multilingual limitations. We have conducted experiments with LGESQL and S2SQL models, enhanced by our Context Similarity Relationship (CSR) approach, which demonstrates competitive performance, reducing the performance gap between the Arabic and English text-to-SQL datasets. Third, we address context-dependent text-to-SQL task, often overlooked by current models. The SParC dataset was explored by utilizing different question rep- resentations and in-context learning prompt engineering techniques. Then, we propose GAT-SQL, an advanced prompt engineering approach that improves both zero-shot and in-context learning experiments. GAT-SQL sets new benchmarks in both SParC and CoSQL datasets. Finally, we introduce Ar-SParC, a context-dependent Arabic text-to-SQL dataset that enables users to interact with the model through a series of interrelated questions. In total, 40 experiments were conducted to investigate this dataset using various prompt engineering techniques, and a novel technique called GAT Corrector was developed, which significantly improved the performance of all base- line models.34 0