TY - JOUR
T1 - Temporally guided articulated hand pose tracking in surgical videos
AU - Louis, Nathan
AU - Zhou, Luowei
AU - Yule, Steven J.
AU - Dias, Roger D.
AU - Manojlovich, Milisa
AU - Pagani, Francis D.
AU - Likosky, Donald S.
AU - Corso, Jason J.
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2023/1
Y1 - 2023/1
N2 - Purpose: Articulated hand pose tracking is an under-explored problem that carries the potential for use in an extensive number of applications, especially in the medical domain. With a robust and accurate tracking system on surgical videos, the motion dynamics and movement patterns of the hands can be captured and analyzed for many rich tasks. Methods: In this work, we propose a novel hand pose estimation model, CondPose, which improves detection and tracking accuracy by incorporating a pose prior into its prediction. We show improvements over state-of-the-art methods which provide frame-wise independent predictions, by following a temporally guided approach that effectively leverages past predictions. Results: We collect Surgical Hands, the first dataset that provides multi-instance articulated hand pose annotations for videos. Our dataset provides over 8.1k annotated hand poses from publicly available surgical videos and bounding boxes, pose annotations, and tracking IDs to enable multi-instance tracking. When evaluated on Surgical Hands, we show our method outperforms the state-of-the-art approach using mean Average Precision, to measure pose estimation accuracy, and Multiple Object Tracking Accuracy, to assess pose tracking performance. Conclusion: In comparison to a frame-wise independent strategy, we show greater performance in detecting and tracking hand poses and more substantial impact on localization accuracy. This has positive implications in generating more accurate representations of hands in the scene to be used for targeted downstream tasks.
AB - Purpose: Articulated hand pose tracking is an under-explored problem that carries the potential for use in an extensive number of applications, especially in the medical domain. With a robust and accurate tracking system on surgical videos, the motion dynamics and movement patterns of the hands can be captured and analyzed for many rich tasks. Methods: In this work, we propose a novel hand pose estimation model, CondPose, which improves detection and tracking accuracy by incorporating a pose prior into its prediction. We show improvements over state-of-the-art methods which provide frame-wise independent predictions, by following a temporally guided approach that effectively leverages past predictions. Results: We collect Surgical Hands, the first dataset that provides multi-instance articulated hand pose annotations for videos. Our dataset provides over 8.1k annotated hand poses from publicly available surgical videos and bounding boxes, pose annotations, and tracking IDs to enable multi-instance tracking. When evaluated on Surgical Hands, we show our method outperforms the state-of-the-art approach using mean Average Precision, to measure pose estimation accuracy, and Multiple Object Tracking Accuracy, to assess pose tracking performance. Conclusion: In comparison to a frame-wise independent strategy, we show greater performance in detecting and tracking hand poses and more substantial impact on localization accuracy. This has positive implications in generating more accurate representations of hands in the scene to be used for targeted downstream tasks.
KW - Articulated pose
KW - Computer vision
KW - Hand pose
KW - Surgical videos
KW - Video tracking
UR - http://www.scopus.com/inward/record.url?scp=85139239446&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139239446&partnerID=8YFLogxK
U2 - 10.1007/s11548-022-02761-6
DO - 10.1007/s11548-022-02761-6
M3 - Article
C2 - 36190616
AN - SCOPUS:85139239446
SN - 1861-6410
VL - 18
SP - 117
EP - 125
JO - International Journal of Computer Assisted Radiology and Surgery
JF - International Journal of Computer Assisted Radiology and Surgery
IS - 1
ER -