Automatic Generation of a Coherent Story from a Set of Images
Date
2023-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
This dissertation explores vision and language (V&L) algorithms. While (V&L) succeeds in image and video captioning tasks, the dynamic Visual Storytelling Task (VST) remains challenging. VST demands coherent stories from a set of images, requiring grammatical accuracy, flow, and style. The dissertation addresses these challenges. Chapter 2 presents a framework utilizing an advanced language model. Chapters 3 and 4 introduce novel techniques that integrate rich visual representation to enhance generated stories. Chapter 5 introduces a new storytelling dataset with a comprehensive analysis. Chapter 6 proposes a state-of-the-art Transformer-based model for generating coherent and informative story descriptions from image sets.
Description
Keywords
Storytelling, Sequential Vision Understanding, Computer Vision, image and video captioning, Deep Learning, Transformer, Advanced Language Model
Citation
Zainy M. Malakan. (2023). Automatic Generation of a Coherent Story from a Set of Images. [Doctoral Thesis, The University of Western Australia].