Improving Feature Location in Source Code via Large Language Model-Based Descriptive Annotations

Alneif, Sultan

Improving Feature Location in Source Code via Large Language Model-Based Descriptive Annotations

Files

Primary SACM-Dissertation.pdf (594.46 KB)

Date

2025-05

Authors

Alneif, Sultan

Publisher

Arizona State University

Abstract

Feature location is a crucial task in software maintenance, aiding developers in identifying the precise segments of code responsible for specific functionalities. Traditional feature location methods, such as grep and static analysis, often result in high false-positive rates and inadequate ranking accuracy, increasing developer effort and reducing productivity. Information Retrieval (IR) techniques like Latent Semantic Indexing (LSI) have improved precision and recall but still struggle with lexical mismatches and semantic ambiguities. This research introduces an innovative method to enhance feature location by augmenting source code corpora with descriptive annotations generated by Large Language Models (LLMs), specifically Code Llama. The enriched corpora provide deeper semantic contexts, improving the alignment between developer queries and relevant source code components. Empirical evaluations were conducted on two open-source systems, HippoDraw and Qt, using standard IR performance metrics: precision, recall, First Relevant Position (FRP), and Last Relevant Position (LRP). Results showed significant performance gains; a 40% precision improvement in HippoDraw, and a 26% improvement in Qt. Recall improved by 32% in HippoDraw and 24% in Qt. The findings highlight the efficacy of incorporating LLM-generated annotations, significantly reducing developer effort and enhancing software comprehension and maintainability. This research provides a practical and scalable solution for software maintenance and evolution tasks.

Keywords

Feature Location, Source Code Comprehension, Software Maintenance, Large Language Models, Code Annotation, Latent Semantic Indexing, Information Retrieval in Software Engineering, Semantic Code Analysis, Program Comprehension, Static Analysis, Code Summarization, Software Evolution, Precision, Recall, Query-Based Feature Location, Code Corpus Enhancement, Natural Language Processing

URI

https://hdl.handle.net/20.500.14154/76104

Collections

SACM - United States of America

Full item page

Improving Feature Location in Source Code via Large Language Model-Based Descriptive Annotations

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By