TY - GEN
T1 - Imputation of missing values in time series with lagged correlations
AU - Rahman, Shah Atiqur
AU - Huang, Yuxiao
AU - Claassen, Jan
AU - Kleinberg, Samantha
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/1/26
Y1 - 2015/1/26
N2 - Missing values are a common problem in real world data and are particularly prevalent in biomedical time series, where a patient's medical record may be split across multiple institutions or a device may briefly fail. These data are not missing completely at random, so ignoring the missing values can lead to bias and error during data mining. However, current methods for imputing missing values have yet to account for the fact that variables are correlated and that those relationships exist across time. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on two biological datasets (simulated glucose in Type 1 diabetes and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length.
AB - Missing values are a common problem in real world data and are particularly prevalent in biomedical time series, where a patient's medical record may be split across multiple institutions or a device may briefly fail. These data are not missing completely at random, so ignoring the missing values can lead to bias and error during data mining. However, current methods for imputing missing values have yet to account for the fact that variables are correlated and that those relationships exist across time. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on two biological datasets (simulated glucose in Type 1 diabetes and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length.
KW - Fourier imputation
KW - correlated data with time-lag
KW - extended k-NN imputation
KW - missing data
UR - http://www.scopus.com/inward/record.url?scp=84936851652&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84936851652&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2014.110
DO - 10.1109/ICDMW.2014.110
M3 - Conference contribution
AN - SCOPUS:84936851652
T3 - IEEE International Conference on Data Mining Workshops, ICDMW
SP - 753
EP - 762
BT - Proceedings - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
A2 - Zhou, Zhi-Hua
A2 - Wang, Wei
A2 - Kumar, Ravi
A2 - Toivonen, Hannu
A2 - Pei, Jian
A2 - Zhexue Huang, Joshua
A2 - Wu, Xindong
T2 - 14th IEEE International Conference on Data Mining Workshops, ICDMW 2014
Y2 - 14 December 2014
ER -