keyboard_arrow_up
Multimodal Large Language Models for Automated Diagnosis and Clinical Decision Support

Authors

Kailash Thiyagarajan, Independent Researcher, USA

Abstract

Healthcare decision-making relies on diverse data sources, including electronic health records (EHRs), medical imaging, and textual clinical notes. Traditional AI models excel in specific tasks such as radiology analysis or clinical text processing but lack the capability to integrate multimodal data holistically. This research introduces a Multimodal Large Language Model (M-LLM) that leverages transformer-based architectures to fuse text, images, and structured patient data for enhanced diagnosis and decision support. The proposed model integrates Vision Transformers (ViTs) for medical imaging, pretrained biomedical large language models (LLMs) for textual analysis, and a multimodal fusion mechanism that enables holistic medical reasoning. The study utilizes MIMIC-IV (EHRs), CheXpert (chest X-rays), and MedQA (medical question answering) datasets to evaluate performance. Results demonstrate that M-LLM outperforms traditional single-modality models while offering superior accuracy, explainability, and robustness in clinical settings.

Keywords

Multimodal Learning, Large Language Models, Clinical Decision Support, Medical Imaging, Vision-Language Models, Healthcare AI, Transformer Models, Biomedical NLP, Explainability, Federated Learning

Full Text  Volume 15, Number 5