Enhancing Opinion Mining in E-Commerce: The Role of Text Segmentation and K-Means Clustering in Transformer-Based Consumer Trust Analysis

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Texas Tech University

Abstract

As the E-commerce market expands, customer reviews have become essential for companies aiming to understand consumer opinions. Building consumer trust is critical to the success of E-commerce businesses, as it significantly influences purchasing decisions. Understanding how to build this trust is essential, especially given that 93% of consumers report that online reviews influence their purchasing choices. Trust in E-commerce is commonly understood as a consumer’s willingness to rely on an online seller based on expectations of reliability, security, and competence. In other words, various factors affect consumer purchase decisions when shopping online. Customer reviews are crucial for gauging consumer opinions and can help identify the factors influencing trust in online shopping. However, current research primarily focuses on using transformer models to classify reviews as positive, negative, or neutral or to predict customer ratings based on the content of those reviews. This dissertation introduces a new approach that expands the capabilities of pre-trained transformer models, such as GPT, BART, and BERT, to extract trust factors from customer reviews, addressing a significant gap in the current literature. The research notably improves the model’s accuracy by utilizing text segmentation. Comparative analysis between segmented and unsegmented datasets, benchmarked against manually annotated reviews, demonstrates that text segmentation increases accuracy. Specifically, GPT-3.5 achieved an accuracy of 86.9%, representing a 15.5 percentage point improvement over unsegmented data. These findings validate segmentation as a critical technique for enhancing granularity and enabling models to identify nuanced trust factors effectively. To further validate the effectiveness of our approach, a second experiment was conducted using a different dataset to determine whether segmentation would yield comparable or even better performance in terms of accuracy. In this experiment, text segmentation was applied before the initial factor extraction to enhance the identification of trust factors. However, the large number of extracted factors created new challenges, as many were redundant or represented similar concepts under different names, complicating large-scale analysis. To address this challenge, K-means clustering, combined with the elbow method, successfully standardized the 2,890 extracted factors and grouped them into nine key categories. This refined process further improved the GPT-3.5 model’s accuracy to 88.5%, demonstrating the scalability and robustness of the proposed methodology in handling large-scale review datasets. The findings highlight the centrality of text segmentation and underscore the crucial role of normalization techniques, particularly K-means clustering, in managing large-scale review datasets. By offering a scalable and adaptable framework, this dissertation provides actionable insights for improving E-commerce analytics. Furthermore, it lays the groundwork for broader applications, extending its suitability beyond E-commerce to other areas where manual labeling is challenging or resource-intensive.

Description

Keywords

Consumer trust, e-commerce, natural language processing, opinion mining, artifcial intelligence, GPT-3.5, BART, BERT, pre-trained transformer models, sentiment analysis

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025