Enhancing Opinion Mining in E-Commerce: The Role of Text Segmentation and K-Means Clustering in Transformer-Based Consumer Trust Analysis

dc.contributor.advisorZhuang, Yu
dc.contributor.authorAlkhalil, Bandar
dc.date.accessioned2025-04-27T06:05:17Z
dc.date.issued2025
dc.description.abstractAs the E-commerce market expands, customer reviews have become essential for companies aiming to understand consumer opinions. Building consumer trust is critical to the success of E-commerce businesses, as it significantly influences purchasing decisions. Understanding how to build this trust is essential, especially given that 93% of consumers report that online reviews influence their purchasing choices. Trust in E-commerce is commonly understood as a consumer’s willingness to rely on an online seller based on expectations of reliability, security, and competence. In other words, various factors affect consumer purchase decisions when shopping online. Customer reviews are crucial for gauging consumer opinions and can help identify the factors influencing trust in online shopping. However, current research primarily focuses on using transformer models to classify reviews as positive, negative, or neutral or to predict customer ratings based on the content of those reviews. This dissertation introduces a new approach that expands the capabilities of pre-trained transformer models, such as GPT, BART, and BERT, to extract trust factors from customer reviews, addressing a significant gap in the current literature. The research notably improves the model’s accuracy by utilizing text segmentation. Comparative analysis between segmented and unsegmented datasets, benchmarked against manually annotated reviews, demonstrates that text segmentation increases accuracy. Specifically, GPT-3.5 achieved an accuracy of 86.9%, representing a 15.5 percentage point improvement over unsegmented data. These findings validate segmentation as a critical technique for enhancing granularity and enabling models to identify nuanced trust factors effectively. To further validate the effectiveness of our approach, a second experiment was conducted using a different dataset to determine whether segmentation would yield comparable or even better performance in terms of accuracy. In this experiment, text segmentation was applied before the initial factor extraction to enhance the identification of trust factors. However, the large number of extracted factors created new challenges, as many were redundant or represented similar concepts under different names, complicating large-scale analysis. To address this challenge, K-means clustering, combined with the elbow method, successfully standardized the 2,890 extracted factors and grouped them into nine key categories. This refined process further improved the GPT-3.5 model’s accuracy to 88.5%, demonstrating the scalability and robustness of the proposed methodology in handling large-scale review datasets. The findings highlight the centrality of text segmentation and underscore the crucial role of normalization techniques, particularly K-means clustering, in managing large-scale review datasets. By offering a scalable and adaptable framework, this dissertation provides actionable insights for improving E-commerce analytics. Furthermore, it lays the groundwork for broader applications, extending its suitability beyond E-commerce to other areas where manual labeling is challenging or resource-intensive.
dc.format.extent78
dc.identifier.urihttps://hdl.handle.net/20.500.14154/75271
dc.language.isoen_US
dc.publisherTexas Tech University
dc.subjectConsumer trust
dc.subjecte-commerce
dc.subjectnatural language processing
dc.subjectopinion mining
dc.subjectartifcial intelligence
dc.subjectGPT-3.5
dc.subjectBART
dc.subjectBERT
dc.subjectpre-trained transformer models
dc.subjectsentiment analysis
dc.titleEnhancing Opinion Mining in E-Commerce: The Role of Text Segmentation and K-Means Clustering in Transformer-Based Consumer Trust Analysis
dc.typeThesis
sdl.degree.departmentComputer Science
sdl.degree.disciplineNatural Language Processing (NLP) and Applied Machine Learning in E-Commerce Analytics
sdl.degree.grantorTexas Tech University
sdl.degree.nameDoctor of Philosophy in Computer Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation .pdf
Size:
1.32 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025