Viet-Anh on Software Logo

What is: BS-Net?

SourceBS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

BS-Net is an architecture for COVID-19 severity prediction based on clinical data from different modalities. The architecture comprises 1) a shared multi-task feature extraction backbone, 2) a lung segmentation branch, 3) an original registration mechanism that acts as a ”multi-resolution feature alignment” block operating on the encoding backbone , and 4) a multi-regional classification part for the final six-valued score estimation.

All these blocks act together in the final training thanks to a loss specifically crated for this task. This loss guarantees also performance robustness, comprising a differentiable version of the target discrete metric. The learning phase operates in a weakly-supervised fashion. This is due to the fact that difficulties and pitfalls in the visual interpretation of the disease signs on CXRs (spanning from subtle findings to heavy lung impairment), and the lack of detailed localization information, produces unavoidable inter-rater variability among radiologists in assigning scores.

Specifically the architectural details are:

  • The input image is processed with a convolutional backbone; the authors opt for a ResNet-18.
  • Segmentation is performed by a nested version of U-Net (U-Net++).
  • Alignment is estimated through the segmentation probability map produced by the U-Net++ decoder, which is achieved through a spatial transformer network -- able to estimate the spatial transform matrix in order to center, rotate, and correctly zoom the lungs. After alignment at various scales, features are forward to a ROIPool.
  • The alignment block is pre-trained on the synthetic alignment dataset in a weakly-supervised setting, using a Dice loss.
  • The scoring head uses FPNs for the combination of multi-scale feature maps. The multiresolution feature aligner produces input feature maps that are well focused on the specific area of interest. Eventually, the output of the FPN layer flows in a series of convolutional blocks to retrieve the output map. The classification is performed by a final Global Average Pooling layer and a SoftMax activation.
  • The Loss function used for training is a sparse categorical cross entropy (SCCE) with a (differentiable) mean absolute error contribution.