TY - JOUR
T1 - Causal inference for time series datasets with partially overlapping variables
AU - Gomez, Louis Adedapo
AU - Claassen, Jan
AU - Kleinberg, Samantha
N1 - Publisher Copyright:
© 2025 Elsevier Inc.
PY - 2025/6
Y1 - 2025/6
N2 - Objective: Healthcare data provides a unique opportunity to learn causal relationships but the largest datasets, such as from hospitals or intensive care units, are often observational and do not standardize variables collected for all patients. Rather, the variables depend on a patient's health status, treatment plan, and differences between providers. This poses major challenges for causal inference, which either must restrict analysis to patients with complete data (reducing power) or learn patient-specific models (making it difficult to generalize). While missing variables can lead to confounding, variables absent for one individual are often measured in another. Methods: We propose a novel method, called Causal Model Combination for Time Series (CMC-TS), to learn causal relationships from time series with partially overlapping variable sets. CMC-TS overcomes errors by specifically leveraging partial overlap between datasets (e.g., patients) to iteratively reconstruct missing variables and correct errors by reweighting inferences using shared information across datasets. We evaluated CMC-TS and compared it to the state of the art on both simulated data and real-world data from stroke patients admitted to a neurological intensive care unit. Results: On simulated data, CMC-TS had the fewest false discoveries and highest F1-score compared to baselines. On real data from stroke patients in a neurological intensive care unit, we found fewer implausible and more highly ranked plausible causes of a clinically important adverse event. Conclusion: Our approach may lead to better use of observational healthcare data for causal inference, by enabling causal inference from patient data with partially overlapping variable sets.
AB - Objective: Healthcare data provides a unique opportunity to learn causal relationships but the largest datasets, such as from hospitals or intensive care units, are often observational and do not standardize variables collected for all patients. Rather, the variables depend on a patient's health status, treatment plan, and differences between providers. This poses major challenges for causal inference, which either must restrict analysis to patients with complete data (reducing power) or learn patient-specific models (making it difficult to generalize). While missing variables can lead to confounding, variables absent for one individual are often measured in another. Methods: We propose a novel method, called Causal Model Combination for Time Series (CMC-TS), to learn causal relationships from time series with partially overlapping variable sets. CMC-TS overcomes errors by specifically leveraging partial overlap between datasets (e.g., patients) to iteratively reconstruct missing variables and correct errors by reweighting inferences using shared information across datasets. We evaluated CMC-TS and compared it to the state of the art on both simulated data and real-world data from stroke patients admitted to a neurological intensive care unit. Results: On simulated data, CMC-TS had the fewest false discoveries and highest F1-score compared to baselines. On real data from stroke patients in a neurological intensive care unit, we found fewer implausible and more highly ranked plausible causes of a clinically important adverse event. Conclusion: Our approach may lead to better use of observational healthcare data for causal inference, by enabling causal inference from patient data with partially overlapping variable sets.
KW - Causal inference
KW - Health informatics
KW - Overlapping datasets
KW - Time series data
UR - http://www.scopus.com/inward/record.url?scp=105003277872&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105003277872&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2025.104828
DO - 10.1016/j.jbi.2025.104828
M3 - Article
C2 - 40274036
AN - SCOPUS:105003277872
SN - 1532-0464
VL - 166
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 104828
ER -