Disinformation Classification Using Transformer based Machine Learning
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Howard University
Abstract
The proliferation of false information via social media has become an increasingly pressing
problem. Digital means of communication and social media platforms facilitate the rapid spread
of disinformation, which calls for the development of advanced techniques for identifying incorrect
information. This dissertation endeavors to devise effective multimodal techniques for identifying
fraudulent news, considering the noteworthy influence that deceptive stories have on society.
The study proposes and evaluates multiple approaches, starting with a transformer-based model
that uses word embeddings for accurate text classification. This model significantly outperforms
baseline methods such as hybrid CNN and RNN, achieving higher accuracy.
The dissertation also introduces a novel BERT-powered multimodal approach to fake news
detection, combining textual data with extracted text from images to improve accuracy. By lever aging the strengths of the BERT-base-uncased model for text processing and integrating it with
image text extraction via OCR, this approach calculates a confidence score indicating the likeli hood of news being real or fake. Rigorous training and evaluation show significant improvements
in performance compared to state-of-the-art methods.
Furthermore, the study explores the complexities of multimodal fake news detection, integrat ing text, images, and videos into a unified framework. By employing BERT for textual analysis and
CNN for visual data, the multimodal approach demonstrates superior performance over traditional
models in handling multiple media formats. Comprehensive evaluations using datasets such as
ISOT and MediaEval 2016 confirm the robustness and adaptability of these methods in combating
the spread of fake news.
This dissertation contributes valuable insights to fake news detection, highlighting the effec tiveness of transformer-based models, emotion-aware classifiers, and multimodal frameworks. The
findings provide robust solutions for detecting misinformation across diverse platforms and data
types, offering a path forward for future research in this critical area.
Description
The digital age has revolutionized the way information spreads, largely due to the internet’s rapid
growth. While this has made knowledge more accessible and empowered societies, it has also
paved the way for the widespread dissemination of disinformation. In the current social media
landscape, false information often spreads more rapidly than verified news, posing serious chal lenges for governments, the media, and regulators. These challenges are particularly evident during
critical events such as elections, global health crises, and political conflicts, where misleading in formation can manipulate public opinion and threaten democratic institutions [1]. Disinformation
now extends beyond text, frequently involving visual content like images and videos, which com plicates efforts to identify and counteract it. People often perceive visual information as more trust worthy, making it easier for manipulated content to mislead viewers. Additionally, sophisticated
editing techniques blur the line between authentic and fabricated content. As new technologies
emerge, the tactics used to spread disinformation continuously evolve, increasing the difficulty of
detecting it across different types of media. Traditional methods of combating disinformation, such
as rule-based systems and basic machine learning models, have not been sufficient in addressing
the sophisticated nature of misinformation that blends multiple media formats. This research aims
to develop more refined detection strategies, focusing on methods that can analyze both textual and
visual elements. By incorporating an understanding of emotional signals and examining various
types of content, the study seeks to improve the accuracy and efficiency of disinformation detection
in today’s complex information environment.
Keywords
Machine learning, NLP, CNN, Transformers, artificial intelligence, Fake news, multi-modal data, BERT, RNN, OCR
Citation
M. Al-Alshaqi and D. B. Rawat, "Disinformation Classification Using Transformer based Machine Learning," 2023