TY - JOUR
T1 - A Weakly Supervised Multi-task Ranking Framework for Actor–Action Semantic Segmentation
AU - Yan, Yan
AU - Xu, Chenliang
AU - Cai, Dawen
AU - Corso, Jason J.
N1 - Publisher Copyright:
© 2019, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2020/5/1
Y1 - 2020/5/1
N2 - Modeling human behaviors and activity patterns has attracted significant research interest in recent years. In order to accurately model human behaviors, we need to perform fine-grained human activity understanding in videos. Fine-grained activity understanding in videos has attracted considerable recent attention with a shift from action classification to detailed actor and action understanding that provides compelling results for perceptual needs of cutting-edge autonomous systems. However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions. To address these issues, in this paper, we propose a novel Schatten p-norm robust multi-task ranking model for weakly-supervised actor–action segmentation where only video-level tags are given for training samples. Our model is able to share useful information among different actors and actions while learning a ranking matrix to select representative supervoxels for actors and actions respectively. Final segmentation results are generated by a conditional random field that considers various ranking scores for video parts. Extensive experimental results on both the actor–action dataset and the Youtube-objects dataset demonstrate that the proposed approach outperforms the state-of-the-art weakly supervised methods and performs as well as the top-performing fully supervised method.
AB - Modeling human behaviors and activity patterns has attracted significant research interest in recent years. In order to accurately model human behaviors, we need to perform fine-grained human activity understanding in videos. Fine-grained activity understanding in videos has attracted considerable recent attention with a shift from action classification to detailed actor and action understanding that provides compelling results for perceptual needs of cutting-edge autonomous systems. However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions. To address these issues, in this paper, we propose a novel Schatten p-norm robust multi-task ranking model for weakly-supervised actor–action segmentation where only video-level tags are given for training samples. Our model is able to share useful information among different actors and actions while learning a ranking matrix to select representative supervoxels for actors and actions respectively. Final segmentation results are generated by a conditional random field that considers various ranking scores for video parts. Extensive experimental results on both the actor–action dataset and the Youtube-objects dataset demonstrate that the proposed approach outperforms the state-of-the-art weakly supervised methods and performs as well as the top-performing fully supervised method.
KW - Actor–action semantic segmentation
KW - Multi-task ranking
KW - Weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85074498785&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074498785&partnerID=8YFLogxK
U2 - 10.1007/s11263-019-01244-7
DO - 10.1007/s11263-019-01244-7
M3 - Article
AN - SCOPUS:85074498785
SN - 0920-5691
VL - 128
SP - 1414
EP - 1432
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 5
ER -