Improving Feature Location in Source Code via Large Language Model-Based Descriptive Annotations
No Thumbnail Available
Date
2025-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Arizona State University
Abstract
Feature location is a crucial task in software maintenance, aiding developers in
identifying the precise segments of code responsible for specific functionalities.
Traditional feature location methods, such as grep and static analysis, often result in high
false-positive rates and inadequate ranking accuracy, increasing developer effort and
reducing productivity. Information Retrieval (IR) techniques like Latent Semantic
Indexing (LSI) have improved precision and recall but still struggle with lexical
mismatches and semantic ambiguities.
This research introduces an innovative method to enhance feature location by augmenting
source code corpora with descriptive annotations generated by Large Language Models
(LLMs), specifically Code Llama. The enriched corpora provide deeper semantic
contexts, improving the alignment between developer queries and relevant source code
components.
Empirical evaluations were conducted on two open-source systems, HippoDraw and Qt,
using standard IR performance metrics: precision, recall, First Relevant Position (FRP),
and Last Relevant Position (LRP). Results showed significant performance gains; a 40%
precision improvement in HippoDraw, and a 26% improvement in Qt. Recall improved
by 32% in HippoDraw and 24% in Qt. The findings highlight the efficacy of
incorporating LLM-generated annotations, significantly reducing developer effort and
enhancing software comprehension and maintainability. This research provides a
practical and scalable solution for software maintenance and evolution tasks.
Description
Keywords
Feature Location, Source Code Comprehension, Software Maintenance, Large Language Models, Code Annotation, Latent Semantic Indexing, Information Retrieval in Software Engineering, Semantic Code Analysis, Program Comprehension, Static Analysis, Code Summarization, Software Evolution, Precision, Recall, Query-Based Feature Location, Code Corpus Enhancement, Natural Language Processing