Enhancing Clarity and Readability in Scientific Writing: An Automated Approach to Identifying Shapeless Sentences

Thumbnail Image

Date

2023-11-02

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

Effective communication is essential in academic writing, where clear and coherent writing ensures research findings are disseminated effectively. However, conveying complex concepts in a readable manner remains a challenge in scientific writing. This thesis investigates automating the application of principles from the book Style: Lessons in Clarity and Grace by Williams [32] to improve the readability of scientific writing. The research focuses on identifying “shapeless” sentences that lack structure and clarity. A dataset of scientific sentences sourced from the Elsevier OA Corpus was manually annotated as “Structured”, “Shapeless” or “N/A” based on principles from Style. A Large Language Model, LLaMA-2, was fine-tuned on this dataset to classify the sentences. Optimization techniques like QLoRA enabled efficient fine-tuning within resource constraints. While, prompt engineering and few-shot learning were used to optimize inference. The fine-tuned model achieved promising accuracy in distinguishing between “Structured” and “Shapeless” sentences. The research demonstrates potential for using fine-tuned language models to automate the application of stylistic principles and enhance scientific writing. Further work is needed to expand the dataset, refine definitions, and optimize model training. Overall, this thesis establishes a foundation for using language models to identify problematic sentences and improve readability

Description

Keywords

Natural Language Processing (NLP), Readability, Large Language Models (LLM), Fine-tuning, Dataset Annotation, Sentence Structuring, Prompt Engineering, Few-shot Learning, Linguistic Optimization, Natural Language Understanding (NLU)

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025