MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

Phan Nguyen Minh Thao, Cong Tinh Dao, Chenwei Wu, Jian Zhe Wang, Shun Liu, Jun En Ding, David Restrepo, Feng Liu, Fang Ming Hung, Wen Chih Peng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they either focus on a single modality or overlook the inter-modality interactions/redundancy. In this work, we propose MEDFuse, a Multimodal EHR Data Fusion framework that incorporates masked lab-test modeling and large language models (LLMs) to effectively integrate structured and unstructured medical data. MEDFuse leverages multimodal embeddings extracted from two sources: LLMs fine-tuned on free clinical text and masked tabular transformers trained on structured lab test results. We design a disentangled transformer module, optimized by a mutual information loss to 1) decouple modality-specific and modality-shared information and 2) extract useful joint representation from the noise and redundancy present in clinical notes. Through comprehensive validation on the public MIMIC-III dataset and the in-house FEMH dataset, MEDFuse demonstrates great potential in advancing clinical predictions, achieving over 90% F1 score in the 10-disease multi-label classification task.

Original languageEnglish
Title of host publicationCIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
Pages3974-3978
Number of pages5
ISBN (Electronic)9798400704369
DOIs
StatePublished - 21 Oct 2024
Event33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, United States
Duration: 21 Oct 202425 Oct 2024

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Country/TerritoryUnited States
CityBoise
Period21/10/2425/10/24

Keywords

  • computer-aided diagnosis
  • electronic health records
  • large language model fine-tuning

Fingerprint

Dive into the research topics of 'MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models'. Together they form a unique fingerprint.

Cite this