Enhancing Clarity and Readability in Scientific Writing: An Automated Approach to Identifying Shapeless Sentences

dc.contributor.advisorLopez, Adam
dc.contributor.authorKamal, Ayah
dc.date.accessioned2023-11-13T10:08:11Z
dc.date.available2023-11-13T10:08:11Z
dc.date.issued2023-11-02
dc.description.abstractEffective communication is essential in academic writing, where clear and coherent writing ensures research findings are disseminated effectively. However, conveying complex concepts in a readable manner remains a challenge in scientific writing. This thesis investigates automating the application of principles from the book Style: Lessons in Clarity and Grace by Williams [32] to improve the readability of scientific writing. The research focuses on identifying “shapeless” sentences that lack structure and clarity. A dataset of scientific sentences sourced from the Elsevier OA Corpus was manually annotated as “Structured”, “Shapeless” or “N/A” based on principles from Style. A Large Language Model, LLaMA-2, was fine-tuned on this dataset to classify the sentences. Optimization techniques like QLoRA enabled efficient fine-tuning within resource constraints. While, prompt engineering and few-shot learning were used to optimize inference. The fine-tuned model achieved promising accuracy in distinguishing between “Structured” and “Shapeless” sentences. The research demonstrates potential for using fine-tuned language models to automate the application of stylistic principles and enhance scientific writing. Further work is needed to expand the dataset, refine definitions, and optimize model training. Overall, this thesis establishes a foundation for using language models to identify problematic sentences and improve readability
dc.format.extent57
dc.identifier.urihttps://hdl.handle.net/20.500.14154/69649
dc.language.isoen
dc.publisherSaudi Digital Library
dc.subjectNatural Language Processing (NLP)
dc.subjectReadability
dc.subjectLarge Language Models (LLM)
dc.subjectFine-tuning
dc.subjectDataset Annotation
dc.subjectSentence Structuring
dc.subjectPrompt Engineering
dc.subjectFew-shot Learning
dc.subjectLinguistic Optimization
dc.subjectNatural Language Understanding (NLU)
dc.titleEnhancing Clarity and Readability in Scientific Writing: An Automated Approach to Identifying Shapeless Sentences
dc.typeThesis
sdl.degree.departmentInformatics
sdl.degree.disciplineArtificial Intelligence
sdl.degree.grantorThe University of Edinburgh
sdl.degree.nameMaster of Science

Files

Copyright owned by the Saudi Digital Library (SDL) © 2025