TY - JOUR
T1 - Temporal Contrastive Learning for Sensor-Based Human Activity Recognition
T2 - A Self-Supervised Approach
AU - Chen, Xiaobing
AU - Zhou, Xiangwei
AU - Sun, Mingxuan
AU - Wang, Hao
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2025
Y1 - 2025
N2 - Deep learning techniques can make use of a large amount of time-series data from wearable devices and greatly benefit the development of sensor-based human activity recognition (HAR). However, representation learning in a supervised manner requires massive labeled sensory data that are time-consuming to obtain and hindered by privacy concerns. To address these issues, we utilize the plentiful unlabeled sensory data and propose a novel self-supervised learning framework, namely temporal contrastive learning in HAR (TCLHAR), which learns the meaningful feature representations for time-series data without labels. Our TCLHAR framework utilizes the temporal co-occurrence relationship among time windows as the supervisory signals to construct positive pairs in the encoder pretraining stage. The encoder is designed for cross-modality fusion, which leverages the local interactions of each sensor modality and the global fusion of features from different sensors. The proposed framework is extensively evaluated on public HAR datasets in supervised, self-supervised, and semi-supervised settings. Our method outperforms several self-supervised learning benchmark models, achieving comparable results with fully labeled data training. When labeled data are scarce, our method can boost the F1 score by up to 65% over traditional supervised training, which demonstrates the effectiveness of our feature representations.
AB - Deep learning techniques can make use of a large amount of time-series data from wearable devices and greatly benefit the development of sensor-based human activity recognition (HAR). However, representation learning in a supervised manner requires massive labeled sensory data that are time-consuming to obtain and hindered by privacy concerns. To address these issues, we utilize the plentiful unlabeled sensory data and propose a novel self-supervised learning framework, namely temporal contrastive learning in HAR (TCLHAR), which learns the meaningful feature representations for time-series data without labels. Our TCLHAR framework utilizes the temporal co-occurrence relationship among time windows as the supervisory signals to construct positive pairs in the encoder pretraining stage. The encoder is designed for cross-modality fusion, which leverages the local interactions of each sensor modality and the global fusion of features from different sensors. The proposed framework is extensively evaluated on public HAR datasets in supervised, self-supervised, and semi-supervised settings. Our method outperforms several self-supervised learning benchmark models, achieving comparable results with fully labeled data training. When labeled data are scarce, our method can boost the F1 score by up to 65% over traditional supervised training, which demonstrates the effectiveness of our feature representations.
KW - Contrastive learning
KW - human activity recognition (HAR)
KW - representation learning
KW - self-supervised learning
UR - https://www.scopus.com/pages/publications/85209748721
UR - https://www.scopus.com/inward/citedby.url?scp=85209748721&partnerID=8YFLogxK
U2 - 10.1109/JSEN.2024.3491933
DO - 10.1109/JSEN.2024.3491933
M3 - Article
AN - SCOPUS:85209748721
SN - 1530-437X
VL - 25
SP - 1839
EP - 1850
JO - IEEE Sensors Journal
JF - IEEE Sensors Journal
IS - 1
ER -