MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

  • Phan Nguyen Minh Thao
  • , Cong Tinh Dao
  • , Chenwei Wu
  • , Jian Zhe Wang
  • , Shun Liu
  • , Jun En Ding
  • , David Restrepo
  • , Feng Liu
  • , Fang Ming Hung
  • , Wen Chih Peng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they either focus on a single modality or overlook the inter-modality interactions/redundancy. In this work, we propose MEDFuse, a Multimodal EHR Data Fusion framework that incorporates masked lab-test modeling and large language models (LLMs) to effectively integrate structured and unstructured medical data. MEDFuse leverages multimodal embeddings extracted from two sources: LLMs fine-tuned on free clinical text and masked tabular transformers trained on structured lab test results. We design a disentangled transformer module, optimized by a mutual information loss to 1) decouple modality-specific and modality-shared information and 2) extract useful joint representation from the noise and redundancy present in clinical notes. Through comprehensive validation on the public MIMIC-III dataset and the in-house FEMH dataset, MEDFuse demonstrates great potential in advancing clinical predictions, achieving over 90% F1 score in the 10-disease multi-label classification task.

Original languageEnglish
Title of host publicationCIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
Pages3974-3978
Number of pages5
ISBN (Electronic)9798400704369
DOIs
StatePublished - 21 Oct 2024
Event33rd ACM International Conference on Information and Knowledge Management, CIKM 2024 - Boise, United States
Duration: 21 Oct 202425 Oct 2024

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Country/TerritoryUnited States
CityBoise
Period21/10/2425/10/24

Keywords

  • computer-aided diagnosis
  • electronic health records
  • large language model fine-tuning

Fingerprint

Dive into the research topics of 'MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models'. Together they form a unique fingerprint.

Cite this