TY - GEN
T1 - Weakly supervised actor-action segmentation via robust multi-task ranking
AU - Yan, Yan
AU - Xu, Chenliang
AU - Cai, Dawen
AU - Corso, Jason J.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/6
Y1 - 2017/11/6
N2 - Fine-grained activity understanding in videos has attracted considerable recent attention with a shift from action classification to detailed actor and action understanding that provides compelling results for perceptual needs of cutting-edge autonomous systems. However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions. To address these issues, in this paper, we propose a novel, robust multi-task ranking model for weakly-supervised actor-action segmentation where only video-level tags are given for training samples. Our model is able to share useful information among different actors and actions while learning a ranking matrix to select representative supervoxels for actors and actions respectively. Final segmentation results are generated by a conditional random field that considers various ranking scores for video parts. Extensive experimental results on the Actor-Action Dataset (A2D) demonstrate that the proposed approach outperforms the state-of-the-art weakly supervised methods and performs as well as the top-performing fully supervised method.
AB - Fine-grained activity understanding in videos has attracted considerable recent attention with a shift from action classification to detailed actor and action understanding that provides compelling results for perceptual needs of cutting-edge autonomous systems. However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely labeled data, and they fail to capture any internal relationship among actors and actions. To address these issues, in this paper, we propose a novel, robust multi-task ranking model for weakly-supervised actor-action segmentation where only video-level tags are given for training samples. Our model is able to share useful information among different actors and actions while learning a ranking matrix to select representative supervoxels for actors and actions respectively. Final segmentation results are generated by a conditional random field that considers various ranking scores for video parts. Extensive experimental results on the Actor-Action Dataset (A2D) demonstrate that the proposed approach outperforms the state-of-the-art weakly supervised methods and performs as well as the top-performing fully supervised method.
UR - http://www.scopus.com/inward/record.url?scp=85042027771&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85042027771&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2017.115
DO - 10.1109/CVPR.2017.115
M3 - Conference contribution
AN - SCOPUS:85042027771
T3 - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
SP - 1022
EP - 1031
BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Y2 - 21 July 2017 through 26 July 2017
ER -