Improving Stopping Methods for Technology Assisted Review
No Thumbnail Available
Date
2026
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Technology Assisted Review (TAR) aims to reduce the effort required to review large collections of documents for relevance. Common applications include systematic reviews in the medical domain and eDiscovery in the legal sector. TAR usually involves ranking documents for reviewing, and while most relevant documents are prioritised early in the ranking, deciding when to stop the reviewing process remains one of the TAR challenges. This thesis introduces multiple TAR stopping approaches that aim to achieve a given target recall while examining as few documents as possible. The first stopping approach is based on point processes, which are statistical models used to represent random events. The approach uses rate functions to model the occurrence of relevant documents over a ranking and hence estimate the total number of relevant documents to indicate a stopping point. Using two point processes (Inhomogeneous Poisson and Cox), the effect of multiple rate functions has been explored, including hyperbolic decline, the first to be used in TAR. The second approach is a novel stopping method based on reinforcement learning, which trains an agent to explore an environment represented by the ranking and maximises a reward function for a given target recall. Additionally, the approach has been generalised to be trained on multiple target recall levels simultaneously, and its reward function has been enhanced to be adaptable for different stopping objectives, such as ensuring reaching the target or minimising cost. For both stopping approaches, the efficacy of integrating text classifiers’ predictions of unexamined documents has also been explored. Furthermore, stopping effectiveness over different ranking qualities has been introduced, emphasising their effect on stopping. Overall, analysis has been performed on stopping at multiple target recall levels on different datasets, showing how the proposed approaches are effective, outperforming many existing baselines.
Description
Keywords
Information Retrieval, Technology Assisted Review, Stopping Methods
