Cape Town - 2026 ISMRM-ISMRT Annual Meeting and Exhibition
9 May 2026 – 14 May 2026 · Cape Town, South Africa
271-01-030 / 401-02-002 ISMRM Abstract

A Unified Vision-Language Foundation Model for Multi-Task MRI Application

Accepted
Xingxin He1,2, Aurora Rofena1,3, Yifan Hu1, Ruimin Feng1,2, Zhehao Liao1, Valerio Guarrasi3, Paolo Soda3, Zhaoye Zhou2, Albert Jang1,2, Fang Liu 1,2,4
1Athinoula A. Martinos Center for Biomedical Imaging, Harvard Medical School, Boston, United States of America
2Massachusetts General Hospital, Boston, United States of America
3Unit of Artificial Intelligence and Computer Systems, Department of Engineering, Università Campus Bio-Medico di Roma, Rome, Italy
4Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, Boston, United States of America
Presenting Author: Fang Liu

Synopsis

Motivation:
Goals:
Approach:
Results:
Full abstract & presentation

The full text, figures, and any recorded presentation for this abstract are not shown here. Log in if you are a member or registered attendee with access.

Full abstracts, figures, and presentations for Cape Town - 2026 ISMRM-ISMRT Annual Meeting and Exhibition are available to registered attendees. This content becomes freely available to the public roughly two years after the meeting.

To request or purchase access, contact the ISMRM Central Office at info@ismrm.org.

Log in

References

1. McRobbie, D. W., Moore, E. A., Graves, M. J. & Prince, M. R. MRI from Picture to Proton. (Cambridge university press, 2017).
2. Lustig, M., Donoho, D. & Pauly, J. M. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 58, 1182–1195 (2007). https://doi.org/10.1002/mrm.21391 [doi]
3. Heimann, T. & Meinzer, H.-P. Statistical shape models for 3D medical image segmentation: a review. Medical image analysis 13, 543–563 (2009). https://doi.org/10.1016/j.media.2009.05.004 [doi]
4. Langlotz, C. P. RadLex: a new method for indexing online educational materials. Radiographics vol. 26 1595–1597 (2006). https://doi.org/10.1148/rg.266065168 [doi]
5. Kahn Jr, C. E. et al. Toward best practices in radiology reporting. Radiology 252, 852–856 (2009). https://doi.org/10.1148/radiol.2523081992 [doi]
6. Bommasani, R. et al. On the Opportunities and Risks of Foundation Models. Preprint at https://doi.org/10.48550/arXiv.2108.07258 (2022). [doi]
7. Qwen et al. Qwen2.5 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2412.15115 (2025). [doi]
8. Vaswani, A. et al. Attention is All you Need. in Advances in Neural Information Processing Systems (eds Guyon, I. et al.) vol. 30 (Curran Associates, Inc., 2017).
9. Mu, S. & Lin, S. A comprehensive survey of mixture-of-experts: Algorithms, theory, and applications. arXiv preprint arXiv:2503.07137 (2025).
10. Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. in International Conference on Medical Image Computing and Computer-Assisted Intervention 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28 [doi]
11. Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18, 203–211 (2021). https://doi.org/10.1038/s41592-020-01008-z [doi]
12. Reis, D., Kupec, J., Hong, J. & Daoudi, A. Real-time flying object detection with YOLOv8. arXiv preprint arXiv:2305.09972 (2023).
13. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition 770–778 (2016).
14. Ravi, N. et al. SAM 2: Segment Anything in Images and Videos. Preprint at https://doi.org/10.48550/arXiv.2408.00714 (2024). [doi]
15. Liu, S. et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023).
16. Radford, A. et al. Learning Transferable Visual Models From Natural Language Supervision. in Proceedings of the 38th International Conference on Machine Learning 8748–8763 (PMLR, 2021).
17. Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Preprint at https://doi.org/10.48550/arXiv.2010.11929 (2021). [doi]
18. Ma, J. et al. Segment anything in medical images. Nat Commun 15, 654 (2024). https://doi.org/10.1038/s41467-024-44824-z [doi]
19. Moor, M. et al. Med-flamingo: a multimodal medical few-shot learner. in Machine Learning for Health (ML4H) 353–367 (PMLR, 2023).

Cite this abstract