Cape Town - 2026 ISMRM-ISMRT Annual Meeting and Exhibition • 09-14 May 2026

Oral

Towards Multimodal Intelligence in MRI: Vision-Language Integration

Back to the Program-at-a-Glance

Towards Multimodal Intelligence in MRI: Vision-Language Integration
Oral
Analysis Methods
Thursday, 14 May 2026
Meeting Room 1.60
13:40 - 15:30
Moderators: Oliver Schad & Hyeong Hun Lee
Session Number: 608-02
CME/CE Credit Available
This session concerns multimodal AI models integrating vision and language for MRI tasks.
Skill Level: Basic,Intermediate,Advanced

13:40   608-02-001.  Primer on Large Vision-Language Models
Anuj Sharma
Case Western Reserve University, Cleveland, United States of America
13:51 Figure 608-02-002.  Automated Quantitative MRI Reporting with Segmentation-Enhanced Multimodal Large Language Models
Suellen Ferraz, Minh Nhat Trinh, Joao Santinha, Teresa Correia
CCMAR, Faro, Portugal
Impact: Multimodal LLMs, combined with segmentation-derived metrics and clinical data, enable the generation of structured, quantitative reports, potentially enhancing diagnostic support, triage efficiency and patient communication, particularly valuable in under-resourced settings, where MRI staff shortages cause delays in diagnosis and treatment.
14:02 Figure 608-02-003.  MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment
Fankai Jia, Daisong Gan, Zhe Zhang, Zhaochi Wen, Yanjie Zhu, Dong Liang, Haifeng Wang
University of Chinese Academy of Sciences, Beijing, China
Impact: MMRQA integrates signal metrics with multimodal LLMs to deliver interpretable MRI quality assessments, enabling rapid artifact detection and clinical decision-making in data-scarce environments, potentially reducing diagnostic errors and optimizing protocols across diverse MR acquisitions.
14:13 Figure 608-02-004.  ScarNet-DPO: A Fully Automated Multi Modal Foundation Model for Highly Accurate Left Ventricular Scar Quantification
Neda Tavakoli, Amir Ali Rahsepar, Santiago López-Tapia, Daniel Lee, Aggelos Katsaggelos, Daniel Kim
Northwestern University Feinberg School of Medicine, Chicago, United States of America
Impact: The proposed automated foundation model overcomes the major barriers of manual prompting and annotation scarcity. It enables LV scar volume to become a practical, standard prognostic metric, accelerating personalized risk stratification in cardiovascular medicine.
14:24 Figure 608-02-005.  Using Large Language Models to Inform Tractography
Elinor Thompson, Tiantian He, Anna Schroder, Ahmed Abdulaal, Alec Sargood, Sonja Soskic, Henry F Tregidgo, Daniel Alexander
University College London, London, United Kingdom
Impact: We show how large language models can provide a novel route for injecting prior neuroanatomical knowledge into connectomics studies, with demonstrated benefits for improving the sensitivity of tractography filtering in a mechanistic model of Alzheimer’s disease.
14:35 Figure 608-02-006.  Evaluating Vision-Language AI for Prostate MRI: Automated Detection and Structured Reporting of Clinically Significant Cancer
Nader Gharbia, Yasmine Saad, Aymen Kammoun, Kays Cheker, Yassine Nouira
Faculty of medicine of Sfax, Tunisia
Impact: Vision-language AI can enhance prostate MRI interpretation by integrating automated lesion detection, quantitative analysis, and structured reporting. This approach can reduce inter-reader variability while enabling standardized, reproducible, and efficient prostate cancer diagnosis, communication, and data-driven research integration.
14:46 Figure 608-02-007.  Improving Diagnostic Accuracy in Preoperative Glioma Classification: Performance of Knowledge-Enhanced Large Language Models
Qianqian Zheng, Shuang Li, Xin Fang, Jing Zhang, Xiaoyong Zhang, Qiang Yue
West China Hospital of Sichuan University, Chendu, China
Impact: Knowledge-enhanced LLMs show diagnostic performance comparable to experienced radiologists in glioma classification and improve junior radiologists’ accuracy. These findings suggest LLMs may serve as valuable decision-support tools, though limitations in certain grading scenarios underscore the necessity of radiologist oversight.
14:57 Figure 608-02-008.  On the Utility of Vision-language Foundation Models for MRI Reconstruction
Ruimin Feng, Xingxin He, Fang Liu
Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, Charlestown, United States of America
Impact: This work introduces vision-language foundation models into fast MRI reconstruction, demonstrating that enforcing semantic consistency improves perceptual quality and structural fidelity. The approach integrates linguistic understanding into image reconstruction, enriching the representational space and promoting multimodal reconstruction in medical imaging.
15:08   608-02-009.  Prospects of Multimodal AI in MRI
Reinhard Heckel
Technical University Munich, Munich, Germany

Back to the Program-at-a-Glance

© 2026 International Society for Magnetic Resonance in Medicine