Measuring The Quality of Wikipedia Articles Among Different Topics

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Saudi Digital Library
Wikipedia, a globally famous online encyclopedia, offers millions of articles across diverse topics. Its open editing policy, allowing contributions from volunteers, has made it a valuable resource. However, its reliability has been questionable, particularly in academic circles. To enhance the understanding of Wikipedia’s quality, and due to the difficulty of the assessment of quality in Wikipedia’s approach, this study presents an innovative approach to evaluate article quality. This study aims to create a quantifiable simple model based on measurable attributes, such as the length of articles, the number of references, and the number of edits. This model facilitates the calculation of article quality and subsequent assignment of quality classifications. As a result, the model proposed in this study shows an approximate accuracy equal to a random forest model which is considered a complex model. Furthermore, the research explores variations in article quality across various topics, shedding light on topics where high-quality content is prevalent and areas that require improvement. Data was collected from the Wikipedia API, and based on these measurable features, quality assessments were made. The findings indicate that Astronomy topics have a higher level of quality, while Language topics have a lower proportion of high-quality topics. These findings suggest that the attributes used to measure quality in this study are sufficient and efficient for assessing article quality on Wikipedia. Moreover, the study highlights the articles that need the experts to focus their efforts on improving articles related to topics such as Language, Business, or Mathematics to enhance the overall quality of content in these topics.
Wikipedia, Quality, Simple model, Random Forest, Accuracy, Classifications