Exploring Quality-Based Data Discovery Approaches in Digital Data Marketplaces
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Digital data marketplaces (DDMs) are emerging platforms that enable individuals and organisations to trade, acquire, and monetise data products. Despite the growing relevance of data marketplaces, potential data buyers encounter significant challenges in the discovery, evaluation, and selection of datasets that align with their specific needs. The complexity of the data discovery process, combined with inadequate mechanisms for assessing fitness, often results in suboptimal purchasing decisions. This thesis aims to address these gaps by examining how data buyers navigate DDMs and proposing a novel framework designed to facilitates data quality evaluation, thereby supporting more informed decision-making.
Employing a Design Science Research methodology, the study comprises two iterative research cycles. The initial cycle begins with a systematic literature review, synthesising existing mechanisms for data discovery within DDMs, with an emphasis on metadata representation, requirement expression, matchmaking, and fitness-for-purpose evaluation. This is followed by an interpretive qualitative study that delves into data buyers' purchasing experiences through in-depth interviews. The findings reveal thirteen critical factors influencing buying decisions. Furthermore, the analysis uncovers a novel availability-discoverability gap and reconceptualises trust as a central interpretive evaluation lens.
Building on these findings, a Quality-Based Data Discovery (QBDD) framework is designed and implemented as a proof-of-concept prototype. A mixed-methods evaluation, employing Partial Least Squares Structural Equation Modelling (PLS-SEM) and thematic analysis, validates the framework's effectiveness in explaining 60.7\% of the variance in buyer satisfaction, with four out of five hypotheses receiving empirical support. However, qualitative insights suggest opportunities for enhancement across four thematic domains: information quality, AI-driven discovery, user experience, and platform scope.
The second iteration refines the artefact by incorporating large language models (LLMs) to facilitate conversational interaction for articulating and interpreting data quality requirements. A comparative evaluation demonstrates that LLM-enabled interaction significantly enhances predictive accuracy for decision-making effectiveness and buyer satisfaction. However, complexities emerged in system quality relationships, offering novel insights into the impact of LLMs on user perceptions of system quality and information relevance.
Description
Keywords
Data Marketplace, Data Quality, Large language Models
