|
|
|
|
2024 IEEE The International Conference on Pattern Recognition (ICPR) |
Paper |
|
While state-of-the-art models for breast cancer detection leverage multi-view mammograms for enhanced diagnostic accuracy, they often focus solely on visual mammography data. However, radiologists document valuable lesion descriptors that contain additional information that can enhance mammography-based breast cancer screening. A key question is whether deep learning models can benefit from these expert-derived features. To address this question, we introduce a novel multi-modal approach that combines textual BI-RADS lesion descriptors with visual mammogram content. Our method employs iterative attention layers to effectively fuse these different modalities, significantly improving classification performance over image-only models. Experiments on the CBIS-DDSM dataset demonstrate substantial improvements across all metrics, resulting in an AUC of 0.872, demonstrating the contribution of handcrafted features to end-to-end learning. |
Results on CBIS-DDSM dataset
|
Training: Our model utilizes iterative attention layers to fuse BI-RADS textual descriptors with mammogram images. This multi-modal approach enhances the model's ability to classify benign vs. malignant lesions effectively.
Inference: During inference, the model leverages the learned multi-modal representation to provide more accurate predictions. |
Figure 1: The Deep BI-RADS Network architecture. The model processes both CC and MLO mammogram views along with their corresponding BI-RADS descriptors through parallel branches. Each branch contains encoder blocks that reduce spatial resolution while increasing feature channels, followed by multi-attention layers that fuse visual and textual information through Cross, Self, and View attention mechanisms. |
[15] Mo, Y., Han, C., Liu, Y., Liu, M., Shi, Z., Lin, J., Zhao, B., Huang, C., Qiu, B., Cui, Y., et al.: Hover-trans: Anatomy-aware hover-transformer for roi-free breast cancer diagnosis in ultrasound images. IEEE Transactions on Medical Imaging (2023)
[23] Tulder, G.v., Tong, Y., Marchiori, E.: Multi-view analysis of unregistered medical images using cross-view transformers. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 104–113. Springer (2021)