Deep Clustering and Deep Network Compression
Abstract
The use of deep learning has grown increasingly in recent years, thereby becoming a much-discussed topic
across a diverse range of fields. Deep learning methods have proven to be robust in representation learning
and attained extraordinary achievement. Their success is primarily due to the ability of deep learning to
discover and automatically learn feature representations by mapping input data into abstract and composite
representations in a latent space. Deep learning's ability to deal with high-level representations from data
has inspired us to make use of learned representations, aiming to enhance unsupervised clustering and
evaluate the characteristic strength of internal representations to compress and accelerate deep neural
networks.
Traditional clustering algorithms attain a limited performance as the dimensionality increases.
Therefore, the ability to extract high-level representations provides beneficial components that can support
such clustering algorithms. In this work, we first present DeepCluster, a clustering approach embedded in
a deep convolutional auto-encoder. We introduce two clustering methods, namely DCAE-Kmeans and
DCAE-GMM. The DeepCluster allows for data points to be classified into their identical cluster, in the
latent space, in a joint-cost function by simultaneously optimizing the clustering objective and the DCAE
objective, producing stable representations, which is appropriate for the clustering process. Both qualitative
and quantitative evaluations of proposed methods are reported, showing the efficiency of deep clustering
on several public datasets in comparison to the previous state-of-the-art methods.
Following this, we propose a new version of the DeepCluster model to include varying degrees of
discriminative power. This introduces a mechanism which enables the imposition of regularization
techniques and the involvement of a supervision component. The key idea of our approach is to distinguish
the discriminatory power of numerous structures when searching for a compact structure to form robust
clusters. The effectiveness of injecting various levels of discriminatory powers into the learning process is
investigated alongside the exploration and analytical study of the discriminatory power obtained through
the use of two discriminative attributes: data-driven discriminative attributes with the support of
regularization techniques, and supervision discriminative attributes with the support of the supervision
component. An evaluation is provided on four different datasets.
The use of neural networks in various applications is accompanied by a dramatic increase in
computational costs and memory requirements. Making use of the characteristic strength of learned
representations, we propose an iterative pruning method that simultaneously identifies the critical neurons
and prunes the model during training without involving any pre-training or fine-tuning procedures. We
introduce a majority voting technique to compare the activation values among neurons and assign a voting
score to evaluate their importance quantitatively. This mechanism effectively reduces model complexity by
eliminating the less influential neurons and aims to determine a subset of the whole model that can represent
the reference model with much fewer parameters within the training process. Empirically, we demonstrate
that our pruning method is robust across various scenarios, including fully-connected networks (FCNs),
sparsely-connected networks (SCNs), and Convolutional neural networks (CNNs), using two public
datasets. Moreover, we also propose a novel framework to measure the importance of individual hidden units
by computing a measure of relevance to identify the most critical filters and prune them to compress and
accelerate CNNs. Unlike existing metho