Addressing the Cold-Start Problem in Active Learning for Improved model Performance
No Thumbnail Available
Date
2024-08
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
King's College London
Abstract
By providing an effective approach that improves model performance through optimized
information selection, this study aims to address the cold-start issue in active learning.
In the context of active learning, the cold-start problem—where models have minimal
labeled data to start training with this particularly challenging. To maximize annotation
efficiency and enhance overall model performance, we propose to train a model to determine
which subset of unlabeled data points is the most informative for annotation. Our
objective is to reduce the human effort needed for annotation while ensuring the model
receives the most effective training data by carefully choosing these first data points
from a significant number of unlabeled samples. The aim of this project is to enhance
the performance of machine learning models, minimize the load associated with human
annotation, and provide an approach for choosing informative instances. Through comprehensive
experimentation and analysis, we demonstrate that (TypiClust) significantly
enhances model accuracy and robustness. We compare the proposed approach with the
random sampling approach and find that TypiClust has better performance and provides
a valuable framework to address the cold-start issue in various active learning
applications.
Description
Keywords
Active Learning, Cold-Start problem, TypiClust, Random Sampling, Labeled, Unlabeled