Automatic Generation of a Coherent Story from a Set of Images

Thumbnail Image

Date

2023-12

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

This dissertation explores vision and language (V&L) algorithms. While (V&L) succeeds in image and video captioning tasks, the dynamic Visual Storytelling Task (VST) remains challenging. VST demands coherent stories from a set of images, requiring grammatical accuracy, flow, and style. The dissertation addresses these challenges. Chapter 2 presents a framework utilizing an advanced language model. Chapters 3 and 4 introduce novel techniques that integrate rich visual representation to enhance generated stories. Chapter 5 introduces a new storytelling dataset with a comprehensive analysis. Chapter 6 proposes a state-of-the-art Transformer-based model for generating coherent and informative story descriptions from image sets.

Description

Keywords

Storytelling, Sequential Vision Understanding, Computer Vision, image and video captioning, Deep Learning, Transformer, Advanced Language Model

Citation

Zainy M. Malakan. (2023). Automatic Generation of a Coherent Story from a Set of Images. [Doctoral Thesis, The University of Western Australia].

Collections

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025