Semi-supervised Learning for TKR prediction with UDA

Title: Semi-supervised Learning for Predicting Total Knee Replacement with Unsupervised Data Augmentation

Authors: Jimin Tan, Bofei Zhang, Kyunghyun Cho, Gregory Chang, and Cem M. Deniz

Published at: SPIE medical imaging 2020

Introduction

Osteoarthritis (OA) is a chronic degenerative disorder of joints and is the most common reason leading to total knee joint replacement (TKR). In this paper, we implemented a semi-supervised learning approach based on Unsupervised Data Augmentation (UDA) along with valid perturbations for radiographs to enhance the performance of supervised TKR outcome prediction model.

We used radiographs from the Osteoarthritis Initiative (OAI) dataset and performed knee joint localization with model based on ResNet-18 architecture to extract left and right knee.

Our semi-supervised model consists of supervised and UDA modules. The final loss was a weighted combination of both CE and KLD losses.

Result

Our semi-supervised model has an average ROC AUC of 0.79 compared to an average of 0.74 on the supervised baseline with a 4-fold cross validation. The overall increase in accuracy was 6.8%. A paired DeLong test between these two models showed that the difference were significant (p-value = 5.869 × 10−5).

Ablation Study

Size of the unlabeled dataset

We looked at how the size of the unlabeled data affect performance of the model. The table showed a positive correlation between the number of unlabeled data and validation accuracy. However, there is a diminising return.

Augmentation diversity in UDA

We analyzed the effect of increasing the number of augmentations applied in the UDA module. We found that the model performed better and is more consistent as the number of augmentations increases.

Multi-task Learning with consistency training

In this experiment, we are using the consistency training module as a add-on task to exsisting supervised learning module. Both of the modules are using the same labeled data. The figure below showed that the overfitting reduction of multi-task learning is on par with UDA and both of them out performed the baseline supervised model.