BigDataFr recommends: Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data
Subjects: Machine Learning (stat.ML)
[…] Clustering datasets with a mix of continuous and categorical attributes is encountered routinely by data analysts. This work presents a method to clustering such datasets using Homogeneity Analysis. An Optimal Euclidean representation of mixed datasets is obtained using Homogeneity Analysis. This representation is then clustered. The relevant aspects of the theory from Homogeneity Analysis used to determine a numerical representation of the categorical attributes is presented. An illustration of the method to real world data sets, including a very large dataset, is provided. […]
Read paper
By Rajiv Sambasivan
Source: arxiv.org