A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?

Research output: Contribution to Book/Report typesContribution to conference proceedingspeer-review

Abstract

Vision language pre-training has recently gained popularity as it allows learning rich feature representations using large-scale data sources. This paradigm has quickly made its way into the medical image analysis community. In particular, there is an impressive amount of recent literature developing vision-language models for radiology. However, the available medical datasets with image-text supervision are scarce, and medical concepts are fine-grained, involving expert knowledge that existing vision-language models struggle to encode. In this paper, we propose to take a prudent step back from the literature and revisit supervised, unimodal pre-training, using fine-grained labels instead. We conduct an extensive comparison demonstrating that unimodal pre-training is highly competitive and better suited to integrating heterogeneous data sources. Our results also question the potential of recent vision-language models for open-vocabulary generalization, which have been evaluated using optimistic experimental settings. Finally, we study novel alternatives to better integrate fine-grained labels and noisy text supervision. Code and weights are available: https://github.com/jusiro/DLILP.

Original languageEnglish
Title of host publicationInformation Processing in Medical Imaging - 29th International Conference, IPMI 2025, Proceedings
EditorsIpek Oguz, Shaoting Zhang, Dimitris N. Metaxas
PublisherSpringer Science and Business Media Deutschland GmbH
Pages294-309
Number of pages16
ISBN (Print)9783031966248
DOIs
Publication statusPublished - 2026
Event29th International Conference on Information Processing in Medical Imaging, IPMI 2025 - Kos, Greece
Duration: 25 May 202530 May 2025

Publication series

NameLecture Notes in Computer Science
Volume15830 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference29th International Conference on Information Processing in Medical Imaging, IPMI 2025
Country/TerritoryGreece
CityKos
Period25/05/2530/05/25

!!!Keywords

  • Radiology
  • Transfer learning
  • Vision-language pre-training

Fingerprint

Dive into the research topics of 'A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?'. These topics are generated from the title and abstract of the publication. Together, they form a unique fingerprint.

Cite this