Semi-supervised Overlapping Community Finding with Pairwise Constraints

No Thumbnail Available

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Community structure is an essential property which helps us to understand complex networks. The detection of communities within a network provides us with three important capabilities: to understand its structures and functionalities; to model its dynamic processes, and to potentially predict future behavior in the network. Generally, community detection algorithms represent a form of unsupervised machine learning, in the sense that they rely solely on the network topology during the process of identifying communities. One common issue is that community detection algorithms can fail to uncover groupings that accurately reflect the underlying structures in a network, particularly when these structures highly overlap with one another. One way to mitigate this problem is to employ semi-supervised learning strategies. These typically involves harnessing existing background information (e.g. from individual domain experts or via crowd- sourcing), which can be used to provide a limited degree of supervision for the com- munity detection algorithm. Often this information is encoded in the form of must-link and cannot-link constraints, which indicate that a pair of nodes should either always be or should never be assigned to the same community. The aim of this thesis is to explore the potential of semi-supervised strategies, in particular those incorporating pairwise constraints, in order to improve the effectiveness of algorithms for finding overlapping communities. To the best of our knowledge, these constraints have not been previously considered in the context of overlapping communities. To achieve this aim, three main phases of research have been undertaken. Firstly, we explore the nuances around the selection of constraints, which are specific to con- texts where the communities in the data overlap. Based on these insights we propose a new semi-supervised algorithm for detecting overlapping communities. Secondly, we use an active learning-inspired method for selecting informative pairwise constraints to improve algorithmic performance while reducing the level of supervision that is required. Finally, we investigate the impact of noisy, incorrectly-labeled constraints upon the performance of our algorithms. We further propose a new approach to handle such cases, which involves treating the noisy constraints as outliers, and applying an outlier detection strategy to identify and remove them. The performance of all our proposed approaches are evaluated via extensive experiments performed on a variety of synthetic benchmark networks and real-world networks.

Description

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025