Cape Town - 2026 ISMRM-ISMRT Annual Meeting and Exhibition - Towards Multimodal Intelligence in MRI: Vision-Language Integration

Towards Multimodal Intelligence in MRI: Vision-Language Integration

Oral

Analysis Methods

Thursday, 14 May 2026

Meeting Room 1.60

13:40 - 15:30

Moderators: Oliver Schad & Hyeong Hun Lee

Session Number: 608-02

CME/CE Credit Available

Similar Sessions

This session concerns multimodal AI models integrating vision and language for MRI tasks.

Skill Level: Basic,Intermediate,Advanced

13:40		608-02-001. Primer on Large Vision-Language Models Anuj Sharma Case Western Reserve University, Cleveland, United States of America
13:51		608-02-002. Automated Quantitative MRI Reporting with Segmentation-Enhanced Multimodal Large Language Models Suellen Ferraz, Minh Nhat Trinh, Joao Santinha, Teresa Correia CCMAR, Faro, Portugal Impact: Multimodal LLMs, combined with segmentation-derived metrics and clinical data, enable the generation of structured, quantitative reports, potentially enhancing diagnostic support, triage efficiency and patient communication, particularly valuable in under-resourced settings, where MRI staff shortages cause delays in diagnosis and treatment.
14:02		608-02-003. MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment Fankai Jia, Daisong Gan, Zhe Zhang, Zhaochi Wen, Yanjie Zhu, Dong Liang, Haifeng Wang University of Chinese Academy of Sciences, Beijing, China Impact: MMRQA integrates signal metrics with multimodal LLMs to deliver interpretable MRI quality assessments, enabling rapid artifact detection and clinical decision-making in data-scarce environments, potentially reducing diagnostic errors and optimizing protocols across diverse MR acquisitions.
14:13		608-02-004. ScarNet-DPO: A Fully Automated Multi Modal Foundation Model for Highly Accurate Left Ventricular Scar Quantification Neda Tavakoli, Amir Ali Rahsepar, Santiago López-Tapia, Daniel Lee, Aggelos Katsaggelos, Daniel Kim Northwestern University Feinberg School of Medicine, Chicago, United States of America Impact: The proposed automated foundation model overcomes the major barriers of manual prompting and annotation scarcity. It enables LV scar volume to become a practical, standard prognostic metric, accelerating personalized risk stratification in cardiovascular medicine.
14:24		608-02-005. Using Large Language Models to Inform Tractography Elinor Thompson, Tiantian He, Anna Schroder, Ahmed Abdulaal, Alec Sargood, Sonja Soskic, Henry F Tregidgo, Daniel Alexander University College London, London, United Kingdom Impact: We show how large language models can provide a novel route for injecting prior neuroanatomical knowledge into connectomics studies, with demonstrated benefits for improving the sensitivity of tractography filtering in a mechanistic model of Alzheimer’s disease.
14:35		608-02-006. Evaluating Vision-Language AI for Prostate MRI: Automated Detection and Structured Reporting of Clinically Significant Cancer Nader Gharbia, Yasmine Saad, Aymen Kammoun, Kays Cheker, Yassine Nouira Faculty of medicine of Sfax, Tunisia Impact: Vision-language AI can enhance prostate MRI interpretation by integrating automated lesion detection, quantitative analysis, and structured reporting. This approach can reduce inter-reader variability while enabling standardized, reproducible, and efficient prostate cancer diagnosis, communication, and data-driven research integration.
14:46		608-02-007. Improving Diagnostic Accuracy in Preoperative Glioma Classification: Performance of Knowledge-Enhanced Large Language Models Qianqian Zheng, Shuang Li, Xin Fang, Jing Zhang, Xiaoyong Zhang, Qiang Yue West China Hospital of Sichuan University, Chendu, China Impact: Knowledge-enhanced LLMs show diagnostic performance comparable to experienced radiologists in glioma classification and improve junior radiologists’ accuracy. These findings suggest LLMs may serve as valuable decision-support tools, though limitations in certain grading scenarios underscore the necessity of radiologist oversight.
14:57		608-02-008. On the Utility of Vision-language Foundation Models for MRI Reconstruction Ruimin Feng, Xingxin He, Fang Liu Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, Charlestown, United States of America Impact: This work introduces vision-language foundation models into fast MRI reconstruction, demonstrating that enforcing semantic consistency improves perceptual quality and structural fidelity. The approach integrates linguistic understanding into image reconstruction, enriching the representational space and promoting multimodal reconstruction in medical imaging.
15:08		608-02-009. Prospects of Multimodal AI in MRI Reinhard Heckel Technical University Munich, Munich, Germany