Enhancing Clarity and Readability in Scientific Writing: An Automated Approach to Identifying Shapeless Sentences
Date
2023-11-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Effective communication is essential in academic writing, where clear and coherent writing ensures research findings are disseminated effectively. However, conveying complex concepts in a readable manner remains a challenge in scientific writing. This thesis investigates automating the application of principles from the book Style: Lessons in Clarity and Grace by Williams [32] to improve the readability of scientific writing. The research focuses on identifying “shapeless” sentences that lack structure and clarity. A dataset of scientific sentences sourced from the Elsevier OA Corpus was manually annotated as “Structured”, “Shapeless” or “N/A” based on principles from Style. A Large Language Model, LLaMA-2, was fine-tuned on this dataset to classify the sentences. Optimization techniques like QLoRA enabled efficient fine-tuning within resource constraints. While, prompt engineering and few-shot learning were used to optimize inference. The fine-tuned model achieved promising accuracy in distinguishing between “Structured” and “Shapeless” sentences. The research demonstrates potential for using fine-tuned language models to automate the application of stylistic principles and enhance scientific writing. Further work is needed to expand the dataset, refine definitions, and optimize model training. Overall, this thesis establishes a foundation for using language models to identify problematic sentences and improve readability
Description
Keywords
Natural Language Processing (NLP), Readability, Large Language Models (LLM), Fine-tuning, Dataset Annotation, Sentence Structuring, Prompt Engineering, Few-shot Learning, Linguistic Optimization, Natural Language Understanding (NLU)