Keyword Kaleidoscope: Identifying the difference in keywords predominantly used within one community via contrasting with another community

dc.contributor.advisorGutmann, Andreas
dc.contributor.advisorMurdoch, Steven
dc.contributor.advisorPsychoula, Ismini
dc.contributor.authorAlhazmi, Alaa
dc.date.accessioned2023-12-31T10:36:14Z
dc.date.available2023-12-31T10:36:14Z
dc.date.issued2023
dc.description.abstractOnline platforms seek to combat unwanted activities and content by implementing measures to block search terms associated with specific keywords frequently used by malicious actors. However, a persistent challenge arises as this approach may inadvertently affect legitimate content that shares these keywords. This study aims to utilize publicly available datasets of online posts to identify differences in the most prominent keywords in these datasets. The goal is to obtain such distinctions by applying similar methods in harmful and benign communities that share similar language and, consequently, employ them toward more effective search term-blocking. To this end, we employed several analysis methods. Keyword frequencies were computed and compared tabularly, visually, and through hypothesis tests. Topic modeling was applied to the reviews from the datasets to examine the keywords within similar topics and their frequencies. Keyword co-occurrences, delineated by how frequently keywords appeared in the same review as each other, were also tallied, and keywords with the top co-occurrence differences were further explored through plots and representative reviews. While this study centered on two reviewer communities, we have discovered several overarching insights, specifically a similar process could be implemented to guide and aid the process of effective banning in search functionalities. The two datasets examined were found to be speaking about similar concepts. While the ordering of the top keywords shifts between the two, the majority of the most frequent keywords are found near the top of both lists. Despite these similarities, however, differences in the overall frequencies of overlapping keywords existed. Notable dissimilarities between the two communities were discovered either as keywords missing from one top list or the other, or in frequency through Pearson’s chi-squared contingency test. The topic model results showed that some topics were present in both communities but were linked to different keywords in each. Finally, the keyword-keyword co-occurrence analysis in this work indicates that even keywords used commonly by both communities can have alternate associations.
dc.format.extent51
dc.identifier.urihttps://hdl.handle.net/20.500.14154/70475
dc.language.isoen
dc.publisherSaudi Digital Library
dc.subjectKeywords
dc.subjectSearch terms
dc.subjectNLP
dc.subjectCommunities
dc.subjectContent moderation
dc.titleKeyword Kaleidoscope: Identifying the difference in keywords predominantly used within one community via contrasting with another community
dc.typeThesis
sdl.degree.departmentComputer Science
sdl.degree.disciplineInformation Security
sdl.degree.grantorLondon's Global University
sdl.degree.nameMaster of Science

Files

Copyright owned by the Saudi Digital Library (SDL) © 2025