Addressing the Cold-Start Problem in Active Learning for Improved model Performance

No Thumbnail Available

Date

2024-08

Journal Title

Journal ISSN

Volume Title

Publisher

King's College London

Abstract

By providing an effective approach that improves model performance through optimized information selection, this study aims to address the cold-start issue in active learning. In the context of active learning, the cold-start problem—where models have minimal labeled data to start training with this particularly challenging. To maximize annotation efficiency and enhance overall model performance, we propose to train a model to determine which subset of unlabeled data points is the most informative for annotation. Our objective is to reduce the human effort needed for annotation while ensuring the model receives the most effective training data by carefully choosing these first data points from a significant number of unlabeled samples. The aim of this project is to enhance the performance of machine learning models, minimize the load associated with human annotation, and provide an approach for choosing informative instances. Through comprehensive experimentation and analysis, we demonstrate that (TypiClust) significantly enhances model accuracy and robustness. We compare the proposed approach with the random sampling approach and find that TypiClust has better performance and provides a valuable framework to address the cold-start issue in various active learning applications.

Description

Keywords

Active Learning, Cold-Start problem, TypiClust, Random Sampling, Labeled, Unlabeled

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025