Improving Feature Location in Source Code via Large Language Model-Based Descriptive Annotations

dc.contributor.advisorAlhindawi, Nouh
dc.contributor.authorAlneif, Sultan
dc.date.accessioned2025-08-10T05:30:34Z
dc.date.issued2025-05
dc.description.abstractFeature location is a crucial task in software maintenance, aiding developers in identifying the precise segments of code responsible for specific functionalities. Traditional feature location methods, such as grep and static analysis, often result in high false-positive rates and inadequate ranking accuracy, increasing developer effort and reducing productivity. Information Retrieval (IR) techniques like Latent Semantic Indexing (LSI) have improved precision and recall but still struggle with lexical mismatches and semantic ambiguities. This research introduces an innovative method to enhance feature location by augmenting source code corpora with descriptive annotations generated by Large Language Models (LLMs), specifically Code Llama. The enriched corpora provide deeper semantic contexts, improving the alignment between developer queries and relevant source code components. Empirical evaluations were conducted on two open-source systems, HippoDraw and Qt, using standard IR performance metrics: precision, recall, First Relevant Position (FRP), and Last Relevant Position (LRP). Results showed significant performance gains; a 40% precision improvement in HippoDraw, and a 26% improvement in Qt. Recall improved by 32% in HippoDraw and 24% in Qt. The findings highlight the efficacy of incorporating LLM-generated annotations, significantly reducing developer effort and enhancing software comprehension and maintainability. This research provides a practical and scalable solution for software maintenance and evolution tasks.
dc.format.extent80
dc.identifier.urihttps://hdl.handle.net/20.500.14154/76104
dc.language.isoen
dc.publisherArizona State University
dc.subjectFeature Location
dc.subjectSource Code Comprehension
dc.subjectSoftware Maintenance
dc.subjectLarge Language Models
dc.subjectCode Annotation
dc.subjectLatent Semantic Indexing
dc.subjectInformation Retrieval in Software Engineering
dc.subjectSemantic Code Analysis
dc.subjectProgram Comprehension
dc.subjectStatic Analysis
dc.subjectCode Summarization
dc.subjectSoftware Evolution
dc.subjectPrecision
dc.subjectRecall
dc.subjectQuery-Based Feature Location
dc.subjectCode Corpus Enhancement
dc.subjectNatural Language Processing
dc.titleImproving Feature Location in Source Code via Large Language Model-Based Descriptive Annotations
dc.typeThesis
sdl.degree.departmentSchool of Computing and Augmented Intelligence
sdl.degree.disciplineSoftware Engineering
sdl.degree.grantorArizona State University
sdl.degree.nameSoftware Engineering

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
594.46 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2026