Analysis of Brand Language Used in Social Media
Abstract
Recently the use of social media has led to a significant improvement in branding techniques. Social media branding is about consistently engaging with customers to enhance brand awareness. Brand language is the body of phrases, words and letters used by businesses to describe themselves and their products. Tone of Voice (ToV) is a fundamental component of brand language that can be used in social media to influence customers’ responses toward a brand and develop a memorable image of it. However, analysing social media, such as Twitter, and converting its contents into valuable knowledge requires the use of Natural Language Processing (NLP) techniques to obtain an overview of all the tweets around a brand.
This dissertation explores the concept of detecting the ToV of brands’ tweets by using different binary text classification models: Naïve Bayes (NB) and Support Vector Machine (SVM) as machine learning models, and Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) as deep learning models. Based on the presence type of brands’ tweets, three datasets were collected to train each model and another three were extracted from Twitter to test these models. Each dataset was prepared, cleaned and pre-processed to transform the texts into a learnable form for text classification models. For feature extraction, different n-grams were applied for NB and SVM models, while word embeddings with random weights and pre-trained GloVe were applied for CNN and LSTM models.
The results of the experiments clarify that the GloVe-LSTM model outperformed other models in accuracy of 66.71% for the Informality dataset and in all measures for the Memotion dataset. However, for the Informality dataset, the 1-3gram-NB model exceeded the GloVe-LSTM model with macro-averaging F1 of 58.75% and weighted-averaging F1 of 62.85%. For the Humour dataset, the 1-2gram-NB model outperformed other models in all measures on the testing dataset.