A Tight Coupling Context-Based Framework for Dataset Discovery
Date
2023-05-15
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Concordia University
Abstract
Discovering datasets of relevance to meet research goals is at the core of different analysis tasks in order to prove proposed hypothesis and theories. In particular, researchers in Artificial Intelligence (AI) and Machine Learning (ML) research domains where relevant datasets are essential for precise predictions have identified how the absence of methods to discover quality datasets are leading to delay and in many cases failure, of ML projects. Many research reports have brought out the absence of dataset discovery methods that fills the gap between analysis requirements and available datasets, and have given statistics to show how it hinders the process of analysis, with completion rate less than 2\%. To the best of our knowledge, removing the above inadequacies remains “an open problem of great importance”. It is in this context that the thesis is making a contribution on context-based tightly coupled framework that will tightly couple dataset providers and data analytics teams. Through this framework, dataset providers publish the metadata descriptions of their datasets and analysts formulate and submit rich queries with goal specifications and quality requirements. The dataset search engine component tightly couples the query specification with metadata specifications datasets through a formal contextualized semantic matching and quality-based ranking and discover all datasets that are relevant to analyst requirements. The thesis gives a proof of concept prototype implementation and reports on its performance and efficiency through a case study.
Description
Keywords
Context Awareness, Dataset Discovery, Dataset Model, Data Quality Features, Dataset Context Model, Data Discovery Framework