Supervised Machine Learning Ensemble for Fake News Classification of Social Media
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Fake news detection is an important task. Social media provides users with an opportunity to communicate and publish content without any financial cost. This opens room for some users to manipulate this opportunity to spread fake news. Social networks generate a considerable amount of data, and manual data processing is troublesome. Therefore, this study investigates automatic fake news detection on Twitter. This research uses a publicly available balanced dataset, called “PAN 2020 Fake news dataset,” containing 500 XML files representing authors, each user has 100 tweets per file. The dataset has English and Spanish tweets. The averaging ensemble method obtained 83.33% accuracy on English tweets considering the retweet feature. In contrast, the Gradient Boosting achieved 81.67% accuracy on Spanish tweets. The obtained results are comparable with the state-of-the-art approach performed in the PAN 2020 fake news detection task. It is observed that Spanish Twitter users have a different posting style than English users. For example, Spanish authors tend to use more emojis in their tweets. The study concludes that retweets, tweet topics, and the count of unique words are vital features to detect fake news spreads on Twitter.