Improvement of biomedical dataset search through the integration of provenance

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

Efforts to support the application of Findable, Accessible, Interoperable, and Reusable (FAIR) principles in the biomedical research domain have led to an increase in the availability of datasets online, facilitating data exchange and reuse. This application significantly enhances research reproducibility and reduces the resources required to conduct research from scratch. As public biomedical repositories proliferate, an enormous number of datasets, encompassing various types of data, have become available to biomedical researchers. However, researchers require methods and tools that assist them in searching for and discovering relevant datasets. They still face challenges when using existing search engines, which may not be well-suited to biomedical research domains. These challenges include a lack of dataset metadata, which affects their ability to select relevant datasets. In this research, we first sought to deepen our understanding of how biomedical researchers search for datasets and the challenges they encounter through semi-structured interviews. Based on our first study’s findings, we focused on a specific challenge — the lack of provenance metadata — and its impact on the decision-making process. We then evaluated how provenance information enhances dataset search through a user study. Following this, we developed a provenance extraction tool to automatically extract provenance information from biomedical publications based on datasets and to estimate its scalability across all articles on exome sequencing experiments in PubMed. We conclude our research by evaluating the usefulness of the provenance extraction tool for dataset search through a user experience study. The findings of this research provide a positive perspective on integrating provenance into biomedical dataset search. The results confirm the usefulness of provenance information in improving dataset search within the biomedical research domain, where the extracted information assists in enhancing decision-making and facilitates the selection of appropriate datasets.

Description

Keywords

Dataset Search

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025