The Dance Of Order And Chaos: Tracking Keywords Evolution in a Community Over Time

dc.contributor.advisorAndreas, Gutmann
dc.contributor.advisorIsmini, Psychoula
dc.contributor.advisorMurdoch, Steven
dc.contributor.authorAlhazmi, Arwa
dc.date.accessioned2023-12-31T14:56:19Z
dc.date.available2023-12-31T14:56:19Z
dc.date.issued2023
dc.description.abstractOnline platforms face a persistent challenge in managing prohibited content. As they act to curtail undesired content by blocking search results for specific search terms (keywords) used by malicious actors, they inadvertently impact their associated benign content. Simultaneously, malicious actors cleverly adapt by introducing intentional language variations to those terms. This risks blocking further innocent content and creates openings for undesired content to remain undetected. Therefore, while bad actors’ use of specialized language offers opportunities for content management, it also raises the need for systems adept at detecting the specific terms used across different timeframes, to thwart their efforts efficiently. In this research, we utilize a publicly available time-series dataset of online posts (news articles) to track keyword evolution over time. We posit that methods adept at capturing these shifts can enhance analysis and consequently, the precision of search terms blocking. Our methods leveraged diverse NLP techniques. Firstly, to track change in keywords, news articles were categorized using BERTopic, keywords were extracted for each article using KeyBERT, and afterward, keywords were sampled and carefully represented utilizing tf-idf for different periods. Subsequently, periods were clustered using hierarchical agglomerative clustering to identify patterns and trends. Secondly, our method for tracking contextual change in keywords consisted of identifying keywords, identifying different topics keywords’ representative articles belong to, and setting criteria for defining prominent and shifting topics. Our analysis has yielded promising results, demonstrating that the clustering approach we have adopted for tracking change is adept for handling time-series keywords. Its strength lies in discerning evolving patterns and temporal shifts in keywords and providing insights into ideal time frames for such monitoring. Notably, we were able to identify recurring or seasonal trends, shortterm trends, extended trends, and distinctive keywords isolated within a single month. Moreover, our method and criteria for tracking and analyzing keywords’ usage evolution between through different contexts have proven effective, as evidenced by identifying a contextual shift in 16% of the top 1,000 keywords in our dataset.
dc.format.extent47
dc.identifier.urihttps://hdl.handle.net/20.500.14154/70481
dc.language.isoen
dc.publisherSaudi Digital Library
dc.subjectKeywords
dc.subjectContent Moderation
dc.subjectSearch Term
dc.subjectNLP
dc.subjectAlgospeak
dc.subjectTime
dc.titleThe Dance Of Order And Chaos: Tracking Keywords Evolution in a Community Over Time
dc.typeThesis
sdl.degree.departmentComputer Science
sdl.degree.disciplineInformation Security
sdl.degree.grantorUniversity College London
sdl.degree.nameMaster of Information Security

Files

Copyright owned by the Saudi Digital Library (SDL) © 2025