Evaluating the Accuracy and Reliability of ChatGPT-4o Mini in Generating Academic References: Impact of Prompt Engineering and Comparative Analysis of Direct Questions vs. Guided Questions Approaches

Xavier, CarpentAlqahtani, Asyah2025-09-152025https://hdl.handle.net/20.500.14154/76393The rapid development of large language models (LLMs) such as ChatGPT attracts the attention of researchers and academics due to its advanced capabilities. However, this comes with controversy on whether it generates authentic academic references. Therefore, this dissertation investigates how use various prompt engineering techniques affect the accuracy and reliability of academic references that generated by ChatGPT-4o mini, by examining whether these references exist in academic databases. The academic disciplines included in this study: computer science, electrical engineering, biology, history, medicine, psychology, and geography. Two approaches are employed to ask the model: direct questions (without prompt engineering) and guided questions (using prompt engineering). A total of 700 questions are analyses, with 100 questions per discipline equally divided between the two approaches. The generation academic references are then checked using the CrossRef, Scopus, and OpenLibrary databases. The findings show differences in the performance of the model across disciplines with the use of prompt engineering techniques. The enhancements in the accuracy of the generated academic references vary, with a 10.75% increase observed in biology and a 2.48% increase in medicine. Conversely, psychology suffers a little decline of 1.08% in accuracy, and electrical engineering faces a significant drop of 8.42% point. These variations show how specific questioning techniques can improve the generated academic references accuracy in some fields yet prove less effective in other fields. The dissertation also examines non-existence academic references generated by ChatGPT-4o mini, discovering that 5% to 16% of these fake academic references include authentic author identities. It shows the model sometimes correctly associates writers with their relevant fields, but frequently connects them to other areas not relevant fields. Nevertheless, fake references with fake authors remain more prevalent. These findings reveal considerable difficulties in using LLMs for the generation of authentic academic references regards to accuracy and reliability. Although prompt engineering techniques have demonstrated improvements in certain domains, the total incidence of fake academic references remains substantial, this highlights the importance of human verification.79enArtifical IntelligenceChatGPTlarge language modelsacademic referencesEvaluating the Accuracy and Reliability of ChatGPT-4o Mini in Generating Academic References: Impact of Prompt Engineering and Comparative Analysis of Direct Questions vs. Guided Questions ApproachesThesis