Statistical Approaches for Binary and Categorical Data Modeling

Fahdah Abdullah Ibrahim Alalyan

Statistical Approaches for Binary and Categorical Data Modeling

Authors

Fahdah Abdullah Ibrahim Alalyan

Publisher

Saudi Digital Library

Abstract

Nowadays a massive amount of data is generated as the development of technology and services has accelerated. Therefore, the demand for data clustering in order to gain knowledge has increased in many sectors such as medical sciences, risk assessment and product sales. Moreover, binary data has been widely used in various applications including market basket data and text documents analysis. While applying classic widely used k-means method is inappropriate to cluster binary data, we propose an improvement of K-medoids algorithm using binary similarity measures instead of Euclidean distance which is generally deployed in clustering algorithms. In addition to K-medoids clustering method, agglomerative hierarchical clustering methods based on Gaussian probability models have recently shown to be ecient in dierent applications. However, the emerging of pattern recognition applications where the features are binary or integer-valued demand extending research eorts to such data types. We propose a hierarchical clustering framework for clustering categorical data based on Multinomial and Bernoulli mixture models. We have compared two widely used density-based distances, namely; Bhattacharyya and Kullback-Leibler. The merits of our proposed clustering frameworks have been shown through extensive experiments on clustering text, binary images categorization and images categorization. The development of generative/discriminative approaches for classifying dierent kinds of data has attracted scholars' attention. Considering the strengths and weaknesses of both approaches, several hybrid learning approaches which combined the desirable properties of both have been developed. Our contribution is to combine Support Vector Machines (SVMs) and Bernoulli mixture model in order to classify binary data. We propose using Bernoulli mixture model for generating probabilistic kernels for SVM based on information divergence. These kernels make intelligent use of unlabeled binary data to achieve good data discrimination. We evaluate the proposed hybrid learning approach by classifying binary and texture images.

URI

https://drepo.sdl.edu.sa/handle/20.500.14154/67099

Collections

SACM - Canada

Full item page

Statistical Approaches for Binary and Categorical Data Modeling

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By