Deep BI-RADS Network for Improved Cancer Detection from Mammograms

Gil Ben-Artzi
Feras Daragma
Shahar Mahpod

School of Computer Science, Ariel University, Israel

2024 IEEE The International Conference on Pattern Recognition (ICPR)

Paper



Highlights

  • Combines textual BI-RADS descriptors with visual mammogram data for improved cancer detection.
  • Employs iterative attention layers for effective multi-modal fusion.
  • Achieves higher classification performance compared to image-only models.
  • Demonstrates an AUC of 0.872 on the CBIS-DDSM dataset.

Abstract

While state-of-the-art models for breast cancer detection leverage multi-view mammograms for enhanced diagnostic accuracy, they often focus solely on visual mammography data. However, radiologists document valuable lesion descriptors that contain additional information that can enhance mammography-based breast cancer screening. A key question is whether deep learning models can benefit from these expert-derived features. To address this question, we introduce a novel multi-modal approach that combines textual BI-RADS lesion descriptors with visual mammogram content. Our method employs iterative attention layers to effectively fuse these different modalities, significantly improving classification performance over image-only models. Experiments on the CBIS-DDSM dataset demonstrate substantial improvements across all metrics, resulting in an AUC of 0.872, demonstrating the contribution of handcrafted features to end-to-end learning.


Results on CBIS-DDSM dataset

Model AUC Accuracy Specificity Precision Recall F1-Score
[15] 0.680 0.661 0.670 0.638 0.651 0.644
[23] 0.811 0.723 0.750 0.686 0.698 0.692
Ours - No descriptors 0.711 0.664 0.650 0.676 0.619 0.634
Ours 0.872 0.760 0.773 0.760 0.743 0.751




Method

Training: Our model utilizes iterative attention layers to fuse BI-RADS textual descriptors with mammogram images. This multi-modal approach enhances the model's ability to classify benign vs. malignant lesions effectively.

Inference: During inference, the model leverages the learned multi-modal representation to provide more accurate predictions.
Deep BI-RADS Network Architecture

Figure 1: The Deep BI-RADS Network architecture. The model processes both CC and MLO mammogram views along with their corresponding BI-RADS descriptors through parallel branches. Each branch contains encoder blocks that reduce spatial resolution while increasing feature channels, followed by multi-attention layers that fuse visual and textual information through Cross, Self, and View attention mechanisms.



References

[15] Mo, Y., Han, C., Liu, Y., Liu, M., Shi, Z., Lin, J., Zhao, B., Huang, C., Qiu, B., Cui, Y., et al.: Hover-trans: Anatomy-aware hover-transformer for roi-free breast cancer diagnosis in ultrasound images. IEEE Transactions on Medical Imaging (2023)

[23] Tulder, G.v., Tong, Y., Marchiori, E.: Multi-view analysis of unregistered medical images using cross-view transformers. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 104–113. Springer (2021)