Comparative Analysis of Lossless Data Compression Algorithms for Textual Data
Date
2023-12-15
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Glasgow
Abstract
This dissertation presents a comprehensive exploration and comparative assessment of key lossless data compression algorithms, specifically Huffman, Lempel-Ziv-Welch (LZW), and Run-Length Encoding (RLE). The study extends to innovative combined functions, integrating Huffman with RLE, LZW with RLE, LZW with Burrows-Wheeler Transform (BWT), LZW with Trie data structure, and a fusion of LZW, BWT, and RLE. Focused primarily on textual data, the research provides a detailed comparative analysis of these algorithms and their hybrid forms.
A key component of this study is the development and implementation of a Command Line Interface (CLI) that facilitates the application and evaluation of these compression techniques and also integrates GPT2 as a text generator. The inclusion of GPT2 adds value to the research by allowing the generation of varied textual data, which are then processed through compression algorithms. It offers a dynamic environment for comprehensive performance analysis while enhancing the practical application of algorithms.
As part of the dissertation, systematic experiments and comparisons evaluate individual and combined algorithms for data compression. The findings reveal the algorithms' strengths, limitations, and suitability for different types of text data in modern digital contexts.
Description
Keywords
Text Compression, Data Compression, Data Science, Lossless Algorithms