Causal inference for time series datasets with partially overlapping variables

Louis Adedapo Gomez, Jan Claassen, Samantha Kleinberg

Research output: Contribution to journalArticlepeer-review

Abstract

Objective: Healthcare data provides a unique opportunity to learn causal relationships but the largest datasets, such as from hospitals or intensive care units, are often observational and do not standardize variables collected for all patients. Rather, the variables depend on a patient's health status, treatment plan, and differences between providers. This poses major challenges for causal inference, which either must restrict analysis to patients with complete data (reducing power) or learn patient-specific models (making it difficult to generalize). While missing variables can lead to confounding, variables absent for one individual are often measured in another. Methods: We propose a novel method, called Causal Model Combination for Time Series (CMC-TS), to learn causal relationships from time series with partially overlapping variable sets. CMC-TS overcomes errors by specifically leveraging partial overlap between datasets (e.g., patients) to iteratively reconstruct missing variables and correct errors by reweighting inferences using shared information across datasets. We evaluated CMC-TS and compared it to the state of the art on both simulated data and real-world data from stroke patients admitted to a neurological intensive care unit. Results: On simulated data, CMC-TS had the fewest false discoveries and highest F1-score compared to baselines. On real data from stroke patients in a neurological intensive care unit, we found fewer implausible and more highly ranked plausible causes of a clinically important adverse event. Conclusion: Our approach may lead to better use of observational healthcare data for causal inference, by enabling causal inference from patient data with partially overlapping variable sets.

Original languageEnglish
Article number104828
JournalJournal of Biomedical Informatics
Volume166
DOIs
StatePublished - Jun 2025

Keywords

  • Causal inference
  • Health informatics
  • Overlapping datasets
  • Time series data

Fingerprint

Dive into the research topics of 'Causal inference for time series datasets with partially overlapping variables'. Together they form a unique fingerprint.

Cite this