Improving vulnerability description using natural language generation

Althebeiti, Hattan

Improving vulnerability description using natural language generation

dc.contributor.advisor	Mohaisen, David
dc.contributor.author	Althebeiti, Hattan
dc.date.accessioned	2023-11-22T09:40:23Z
dc.date.available	2023-11-22T09:40:23Z
dc.date.issued	2023-10-25
dc.description.abstract	Software plays an integral role in powering numerous everyday computing gadgets. As our reliance on software continues to grow, so does the prevalence of software vulnerabilities, with significant implications for organizations and users. As such, documenting vulnerabilities and tracking their development becomes crucial. Vulnerability databases addressed this issue by storing a record with various attributes for each discovered vulnerability. However, their contents suffer several drawbacks, which we address in our work. In this dissertation, we investigate the weaknesses associated with vulnerability descriptions in public repositories and alleviate such weaknesses through Natural Language Processing (NLP) approaches. The first contribution examines vulnerability descriptions in those databases and approaches to improve them. We propose a new automated method leveraging external sources to enrich the scope and context of a vulnerability description. Moreover, we exploit fine-tuned pretrained language models for normalizing the resulting description. The second contribution investigates the need for uniform and normalized structure in vulnerability descriptions. We address this need by breaking the description of a vulnerability into multiple constituents and developing a multi-task model to create a new uniform and normalized summary that maintains the necessary attributes of the vulnerability using the extracted features while ensuring a consistent vulnerability description. Our method proved effective in generating new summaries with the same structure across a collection of various vulnerability descriptions and types. Our final contribution investigates the feasibility of assigning the Common Weakness Enumeration (CWE) attribute to a vulnerability based on its description. CWE offers a comprehensive framework that categorizes similar exposures into classes, representing the types of exploitation associated with such vulnerabilities. Our approach utilizing pre-trained language models is shown to outperform Large Language Model (LLM) for this task. Overall, this dissertation provides various technical approaches exploiting advances in NLP to improve publicly available vulnerability databases.
dc.format.extent	156
dc.identifier.uri	https://hdl.handle.net/20.500.14154/69801
dc.language.iso	en_US
dc.publisher	Saudi Digital Library
dc.subject	Vulnerability
dc.subject	NVD
dc.subject	CVE
dc.subject	Natural Language Processing
dc.subject	Transformer
dc.subject	LLM
dc.subject	BERT
dc.subject	T5
dc.subject	Pre-trained language model
dc.title	Improving vulnerability description using natural language generation
dc.type	Thesis
sdl.degree.department	Electrical and Computer Engineering
sdl.degree.discipline	Natural Language Processing
sdl.degree.grantor	University of Central Florida
sdl.degree.name	Doctor of Philosophy

Collections

SACM - United States of America

Improving vulnerability description using natural language generation

Files

Collections