Enhancing Opinion Mining in E-Commerce: The Role of Text Segmentation and K-Means Clustering in Transformer-Based Consumer Trust Analysis
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Texas Tech University
Abstract
As the E-commerce market expands, customer reviews have become essential for companies aiming to understand
consumer opinions. Building consumer trust is critical to the success of E-commerce businesses, as it significantly
influences purchasing decisions. Understanding how to build this trust is essential, especially given that 93% of
consumers report that online reviews influence their purchasing choices. Trust in E-commerce is commonly
understood as a consumer’s willingness to rely on an online seller based on expectations of reliability, security, and
competence. In other words, various factors affect consumer purchase decisions when shopping online. Customer
reviews are crucial for gauging consumer opinions and can help identify the factors influencing trust in online
shopping. However, current research primarily focuses on using transformer models to classify reviews as positive,
negative, or neutral or to predict customer ratings based on the content of those reviews. This dissertation introduces
a new approach that expands the capabilities of pre-trained transformer models, such as GPT, BART, and BERT, to
extract trust factors from customer reviews, addressing a significant gap in the current literature. The research
notably improves the model’s accuracy by utilizing text segmentation. Comparative analysis between segmented and
unsegmented datasets, benchmarked against manually annotated reviews, demonstrates that text segmentation
increases accuracy. Specifically, GPT-3.5 achieved an accuracy of 86.9%, representing a 15.5 percentage point
improvement over unsegmented data. These findings validate segmentation as a critical technique for enhancing
granularity and enabling models to identify nuanced trust factors effectively. To further validate the effectiveness of
our approach, a second experiment was conducted using a different dataset to determine whether segmentation
would yield comparable or even better performance in terms of accuracy. In this experiment, text segmentation was
applied before the initial factor extraction to enhance the identification of trust factors. However, the large number of
extracted factors created new challenges, as many were redundant or represented similar concepts under different
names, complicating large-scale analysis. To address this challenge, K-means clustering, combined with the elbow
method, successfully standardized the 2,890 extracted factors and grouped them into nine key categories. This
refined process further improved the GPT-3.5 model’s accuracy to 88.5%, demonstrating the scalability and
robustness of the proposed methodology in handling large-scale review datasets. The findings highlight the centrality
of text segmentation and underscore the crucial role of normalization techniques, particularly K-means clustering, in
managing large-scale review datasets. By offering a scalable and adaptable framework, this dissertation provides
actionable insights for improving E-commerce analytics. Furthermore, it lays the groundwork for broader
applications, extending its suitability beyond E-commerce to other areas where manual labeling is challenging or
resource-intensive.
Description
Keywords
Consumer trust, e-commerce, natural language processing, opinion mining, artifcial intelligence, GPT-3.5, BART, BERT, pre-trained transformer models, sentiment analysis