Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
2 results
Search Results
Item Restricted Scalable Human Mobility Prediction: Integrating Clustering and Parallel Processing(Saudi Digital Library, 2024) Alhomidan, Suliman; Chen, ZexunHuman mobility modelling is essential for various applications, including urban planning, transportation logistics, and public health. Traditional algorithms for predicting human movement patterns face significant computational challenges, particularly with large-scale datasets. This dissertation addresses these challenges by introducing an optimised approach that leverages parallel computing and machine learning techniques. We refactored the existing human mobility prediction algorithm to utilise Dask, a parallel computing library that enables distributed processing. This modification enhanced the algorithm's scalability and computational efficiency, making it suitable for big data environments. Additionally, we incorporated clustering as a preprocessing step to group similar users, significantly reducing the number of pairwise comparisons required for trajectory analysis. We evaluated eight clustering algorithms: K-means, Gaussian Mixture Models (GMM), DBSCAN, MeanShift, Agglomerative Clustering, OPTICS, Birch, and HDBSCAN. Each algorithm was tested with various hyperparameters and clustering approaches. Performance metrics, including execution time, Adjusted Rand Index (ARI), and Normalised Mutual Information (NMI), were used to assess the computational efficiency and clustering accuracy of each algorithm. Our findings indicate that the “mean” and “std” aggregation methods consistently provide the best performance in terms of ARI and NMI. The “std” method demonstrated the lowest execution times, highlighting its computational efficiency. The results underscore the importance of selecting appropriate clustering algorithms and parameter values to optimise performance. The improved approach was validated through practical examples, demonstrating substantial reductions in computational complexity compared to the original algorithm. For instance, clustering reduced the complexity from O(n^2∙ m^2 ) to O(t∙nk)+O(n^2/k∙m^2 ) where n is the number of users, m is the number of records per user, k is the number of clusters, and t is the number of iterations for clustering convergence. The practical implications of this research are significant, offering improved computational efficiency for applications in urban planning, public health, and commercial sectors. However, challenges such as real-time processing, adaptive clustering methodologies, and ethical considerations remain. Future research should address these challenges to further enhance the algorithm's applicability and performance. This dissertation presents a robust and scalable solution for human mobility modelling, integrating parallel computing and clustering techniques to significantly improve computational efficiency and accuracy. The flexibility of the implemented code allows users to tailor the clustering approach to their specific needs, ensuring optimal performance for various applications.6 0Item Restricted Automating Agency-Client Matching: Leveraging Recommender Systems for Efficient and Accurate Recommendations(University College London, 2023-12-01) Kurdi, Reem; Tanveer, UmairIn today’s fast-paced and dynamic business landscape, optimizing operational processes is paramount. Agency-client matching is an important procedure that plays a critical role in aligning agencies with client needs for successful collaborations. Traditionally, this matching process required labour-intensive, manual evaluations of numerous agencies and their portfolios. However, the advent of advanced technologies and data-driven approaches has introduced recommender systems as valuable tools to streamline and automate this process. This study presents a unique and innovative cluster-based, hybrid filtering recommender system that utilizes machine learning algorithms and data analysis for agency-client matching. The recommender system follows a comprehensive three-step process, starting with brief preparation, then topic modelling and finally, agency ranking and scoring. Firstly, the briefs undergo a comprehensive pre-processing process to ensure inclusion of relevant text data by removing irrelevant information, such as stop words and entity names. Secondly, the filtered briefs go through topic modelling using the BERTopic framework to extract the keywords and underlying themes. Briefs are first transformed into numerical vectors using BERT embeddings, which helps to capture their semantic meaning and context. After that, dimensionality reduction is applied using UMAP to cluster related briefs. As a subsequent step, DBSTREAM is applied to assign new briefs to existing clusters, or create new clusters. The final step in this block is the implementation of c-TF-IDF which helps generate topic representations by identifying the most frequent words within each topic. Lastly, based on the unique cluster identifier assigned to the new brief, agencies are ranked and scored in line with the brief’s content and client requirements. All in all, the main focus of this study is to develop a ranking and scoring algorithm, tailored with certain criteria, to effectively shortlist relevant agency options and automate the agency-client matching process.13 0
