Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
1 results
Search Results
Item Restricted The Dance Of Order And Chaos: Tracking Keywords Evolution in a Community Over Time(Saudi Digital Library, 2023) Alhazmi, Arwa; Andreas, Gutmann; Ismini, Psychoula; Murdoch, StevenOnline platforms face a persistent challenge in managing prohibited content. As they act to curtail undesired content by blocking search results for specific search terms (keywords) used by malicious actors, they inadvertently impact their associated benign content. Simultaneously, malicious actors cleverly adapt by introducing intentional language variations to those terms. This risks blocking further innocent content and creates openings for undesired content to remain undetected. Therefore, while bad actors’ use of specialized language offers opportunities for content management, it also raises the need for systems adept at detecting the specific terms used across different timeframes, to thwart their efforts efficiently. In this research, we utilize a publicly available time-series dataset of online posts (news articles) to track keyword evolution over time. We posit that methods adept at capturing these shifts can enhance analysis and consequently, the precision of search terms blocking. Our methods leveraged diverse NLP techniques. Firstly, to track change in keywords, news articles were categorized using BERTopic, keywords were extracted for each article using KeyBERT, and afterward, keywords were sampled and carefully represented utilizing tf-idf for different periods. Subsequently, periods were clustered using hierarchical agglomerative clustering to identify patterns and trends. Secondly, our method for tracking contextual change in keywords consisted of identifying keywords, identifying different topics keywords’ representative articles belong to, and setting criteria for defining prominent and shifting topics. Our analysis has yielded promising results, demonstrating that the clustering approach we have adopted for tracking change is adept for handling time-series keywords. Its strength lies in discerning evolving patterns and temporal shifts in keywords and providing insights into ideal time frames for such monitoring. Notably, we were able to identify recurring or seasonal trends, shortterm trends, extended trends, and distinctive keywords isolated within a single month. Moreover, our method and criteria for tracking and analyzing keywords’ usage evolution between through different contexts have proven effective, as evidenced by identifying a contextual shift in 16% of the top 1,000 keywords in our dataset.17 0