Are study filtering and selection software packages effective in supporting medical systematic reviews by reducing workload and maintaining accuracy? : A systematic review
Abstract
Background:
Systematic review is the highest level of evidence that guides clinical practice by facilitating clinical decision making and providing new ideas for research. However, due to an exponential growth in medical knowledge with an acceleration in new journals, systematic reviews can be both complicated and time consuming. Therefore, study screening software packages could be utilized to automate citation screening step, as a potential solution for this complexity.
Aim:
This review aims to investigate if study filtering software packages are effective in supporting medical systematic reviews by reducing workload and maintaining accuracy.
Method:
A systematic review design was used, following the rules and approaches provided by Preferred Reporting Item for Systematic Review and Meta-Analysis statement (PRISMA-2009 statement). Embase, Medline Ovid, Web of Science and IEEE Xplore were searched. To supplement database searches, ACM digital library, MillionShort.com and the reference lists of relevant studies, relevant systematic reviews and studies that cited relevant studies were searched as well to identify additional relevant studies. Also, this review was limited to case studies or randomized, non-randomized, quasi randomized and controlled before and after trials that published from 2005 and after in any language and focused on the automation of study selection process in the medical systematic review or clinical guideline development, and report the technical aspect of automation and its effect in the workload or accuracy. The risk of bias in the included studies were assessed by modified version of Newcastle Ottawa scale for case control studies however, it is not validated yet. The data were narratively synthesized.
Result:
Nine-hundred eighty-five titles and abstracts were screened, 63 full text studies were assessed, and 28 studies were included in the qualitative synthesis. Overall, fifteen of included studies have high quality, ten studies have intermediate quality and three studies have poor quality. Furthermore, the artificial intelligence classifiers used to automate citation screening are support vector machine (n=15), naïve Bayes (n=2), logistic regression (n=1), K mean clustering with maximum entropy algorithm (n=1), neural network (n=1), voting perceptron (n=1), semantic with user feedback (n=1) and random forest classifier (n=1). However, it is not feasible to conclude which approach performs better in term of workload reduction and accuracy due to the variety of approaches and evaluations methodologies. In general, study selection classifiers have ability to achieve approximately 3 to 77% reduction in workload with retrieval of 75 to 100% of eligible studies in the new systematic review, also, in the systematic review update, classifiers accomplish about 36 to 92% reduction in the reviewer workload with identification of 91 to 100% of relevant documents.
Discussion and Conclusion:
Using artificial intelligence classifiers to rank studies according to their relevant to the review question considers practical in the real-world review, however, more research is required to apply classifiers that automatically classify documents into relevant and irrelevant. Despite of that, this method of automation of studies selection could be used with caution as a second reviewer or third reviewer to maximize recall. Additionally, it is evident from literature that there was no collaboration between research teams, to improve collaboration, raw datasets and algorithms which were used to create classifiers should be published to be readily available for researchers.