TY - GEN
T1 - Learning to Estimate External Forces of Human Motion in Video
AU - Louis, Nathan
AU - Corso, Jason J.
AU - Templin, Tylan N.
AU - Eliason, Travis D.
AU - Nicolella, Daniel P.
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/10
Y1 - 2022/10/10
N2 - Analyzing sports performance or preventing injuries requires capturing ground reaction forces (GRFs) exerted by the human body during certain movements. Standard practice uses physical markers paired with force plates in a controlled environment, but this is marred by high costs, lengthy implementation time, and variance in repeat experiments; hence, we propose GRF inference from video. While recent work has used LSTMs to estimate GRFs from 2D viewpoints, these can be limited in their modeling and representation capacity. First, we propose using a transformer architecture to tackle the GRF from video task, being the first to do so. Then we introduce a new loss to minimize high impact peaks in regressed curves. We also show that pre-training and multi-task learning on 2D-to-3D human pose estimation improves generalization to unseen motions. And pre-training on this different task provides good initial weights when finetuning on smaller (rarer) GRF datasets. We evaluate on LAAS Parkour and a newly collected ForcePose dataset; we show up to 19% decrease in error compared to prior approaches.
AB - Analyzing sports performance or preventing injuries requires capturing ground reaction forces (GRFs) exerted by the human body during certain movements. Standard practice uses physical markers paired with force plates in a controlled environment, but this is marred by high costs, lengthy implementation time, and variance in repeat experiments; hence, we propose GRF inference from video. While recent work has used LSTMs to estimate GRFs from 2D viewpoints, these can be limited in their modeling and representation capacity. First, we propose using a transformer architecture to tackle the GRF from video task, being the first to do so. Then we introduce a new loss to minimize high impact peaks in regressed curves. We also show that pre-training and multi-task learning on 2D-to-3D human pose estimation improves generalization to unseen motions. And pre-training on this different task provides good initial weights when finetuning on smaller (rarer) GRF datasets. We evaluate on LAAS Parkour and a newly collected ForcePose dataset; we show up to 19% decrease in error compared to prior approaches.
KW - force prediction
KW - human pose estimation
KW - transformers
KW - video understanding
UR - http://www.scopus.com/inward/record.url?scp=85151147454&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85151147454&partnerID=8YFLogxK
U2 - 10.1145/3503161.3548377
DO - 10.1145/3503161.3548377
M3 - Conference contribution
AN - SCOPUS:85151147454
T3 - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
SP - 3540
EP - 3548
BT - MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
T2 - 30th ACM International Conference on Multimedia, MM 2022
Y2 - 10 October 2022 through 14 October 2022
ER -