The workflow of the autoencoder. First the input passes through the encoder, to produce the code.
The decoder, which has the similar structure, then produces the output only using the code.
Traditional centroid-based clustering algorithms for heterogeneous data with numerical and non-numerical features result in different levels of inaccurate clustering.
This is because the Hamming distance used for dissimilarity measurement of non-numerical values does not provide optimal distances between different values-
Another problems arise from attempts to combine the Euclidean distance and Hamming distance.
For the original non-numerical features, UFT can provide numerical values which preserve the structure of the original non-numerical features and have the property of continuous values at the same time.
For example: the variable ever_smok20py. Number of smoking packs per year with three options (Never smoke, less the 20 packs per year, more than 20 packs per year)
Deep clustering is a recent trend in the machine learning community that aims to employ a deep neural network in an unsupervised learning form.
One of the main families of deep clustering is Deep Embedding Clustering (DEC)1 . The fundamental work of DEC is to learn latent space that preserves properties of the data.
1- Xie, Girshick, and Farhadi (2016).
Cluster 0 (n=329): Generally healthy middle-age individuals, but most with rhinitis
Cluster 1 (n=326): Generally healthy older overweight males with smoking history and tendency to airway obstruction and sputum production.
Cluster 2 (n=172): Symptomatic obese males or females with asthmatic wheezing and often airway medicines in use
Cluster 3 (n=202): Generally healthy normal to overweight females, but most have rhinitis
Cluster 4 (n=64): Airway obstructed older and obese females with smoking history and often using airway medicines. Cough/sputum production, wheezing, dyspnoea and exacerbations common.
Cluster 5 (n=164): Airway obstructed heavy-smoking, older and overweight males and females with sputum production and cough as symptoms, but rhinitis less common
Thank you for your attention