Deep learning-based phenotyping of airway diseases in adults

Rani Basna
STELLAR, Krefting Research Center, University of Gothenburg.

01/10/2020

Autoencoder

Introduction

  • Autoencoders are a specific type of neural networks where the input is the same as the output (unsupervised learning). They compress the input into a lower-dimensional compressed representation (space) and then reconstruct the output from this representation.

  • An autoencoder consists of 3 components: encoder, code (latent space) and decoder.

  • To build an autoencoder we need an encoding method, decoding method, and a loss function to compare the output with the target.

Architecture of the Autoencoder

  • Both the encoder and decoder are fully-connected neural networks. Code (the latent space layer) is a single layer of the neural network with the dimensionality of our choice.

The workflow of the autoencoder. First the input passes through the encoder, to produce the code.

The decoder, which has the similar structure, then produces the output only using the code.

mutual information-based unsupervised feature transformation

Motivation

  • Traditional centroid-based clustering algorithms for heterogeneous data with numerical and non-numerical features result in different levels of inaccurate clustering.

  • This is because the Hamming distance used for dissimilarity measurement of non-numerical values does not provide optimal distances between different values-

  • Another problems arise from attempts to combine the Euclidean distance and Hamming distance.

  • Use the mutual information (MI)-based unsupervised feature transformation (UFT), which can transform non-numerical features into numerical features without information loss.
  • For the original non-numerical features, UFT can provide numerical values which preserve the structure of the original non-numerical features and have the property of continuous values at the same time.

  • For example: the variable ever_smok20py. Number of smoking packs per year with three options (Never smoke, less the 20 packs per year, more than 20 packs per year)

Deep Embedding Clustering

  • Deep clustering is a recent trend in the machine learning community that aims to employ a deep neural network in an unsupervised learning form.

  • One of the main families of deep clustering is Deep Embedding Clustering (DEC)1 . The fundamental work of DEC is to learn latent space that preserves properties of the data.

  • 1- Xie, Girshick, and Farhadi (2016).

  • DEC has two phases:
    1. parameter initialization with a deep autoencoder.
    2. parameter optimization (i.e., clustering), where we iterate between computing an auxiliary target distribution and minimizing the Kullback–Leibler (KL) divergence to it.

A flowchart for the DEC method

Application to our data set

t-distributed stochastic neighbor embedding (tsne)

3d Uniform manifold approximation and projection (umap)

Validation with Random forest

Validation

Validation with Random forest

  • The Airway disease
    • Accuracy:87 %
    • High recall (senitivity) on all classes (between 86% and 100%)
    • High precesion (specificity) on all classes(over 83% and 100%).

Distribution of Age and BMI over Clusters

Clinical interpretation

Clinical interpretation

  • Cluster 0 (n=329): Generally healthy middle-age individuals, but most with rhinitis

  • Cluster 1 (n=326): Generally healthy older overweight males with smoking history and tendency to airway obstruction and sputum production.

  • Cluster 2 (n=172): Symptomatic obese males or females with asthmatic wheezing and often airway medicines in use

  • Cluster 3 (n=202): Generally healthy normal to overweight females, but most have rhinitis

Clinical interpretation

  • Cluster 4 (n=64): Airway obstructed older and obese females with smoking history and often using airway medicines. Cough/sputum production, wheezing, dyspnoea and exacerbations common.

  • Cluster 5 (n=164): Airway obstructed heavy-smoking, older and overweight males and females with sputum production and cough as symptoms, but rhinitis less common

Thank you for your attention