TY - GEN
T1 - Actor-Action Semantic Segmentation with Grouping Process Models
AU - Xu, Chenliang
AU - Corso, Jason J.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/9
Y1 - 2016/12/9
N2 - Actor-action semantic segmentation made an important step toward advanced video understanding: what action is happening, who is performing the action, and where is the action happening in space-time. Current methods based on layered CRFs for this problem are local and unable to capture the long-ranging interactions of video parts. We propose a new model that combines the labeling CRF with a supervoxel hierarchy, where supervoxels at various scales provide cues for possible groupings of nodes in the CRF to encourage adaptive and long-ranging interactions. The new model defines a dynamic and continuous process of information exchange: the CRF influences what supervoxels in the hierarchy are active, and these active supervoxels, in turn, affect the connectivities in the CRF, we hence call it a grouping process model. By further incorporating the video-level recognition, the proposed method achieves a large margin of 60% relative improvement over the state of the art on the recent A2D large-scale video labeling dataset, which demonstrates the effectiveness of our modeling.
AB - Actor-action semantic segmentation made an important step toward advanced video understanding: what action is happening, who is performing the action, and where is the action happening in space-time. Current methods based on layered CRFs for this problem are local and unable to capture the long-ranging interactions of video parts. We propose a new model that combines the labeling CRF with a supervoxel hierarchy, where supervoxels at various scales provide cues for possible groupings of nodes in the CRF to encourage adaptive and long-ranging interactions. The new model defines a dynamic and continuous process of information exchange: the CRF influences what supervoxels in the hierarchy are active, and these active supervoxels, in turn, affect the connectivities in the CRF, we hence call it a grouping process model. By further incorporating the video-level recognition, the proposed method achieves a large margin of 60% relative improvement over the state of the art on the recent A2D large-scale video labeling dataset, which demonstrates the effectiveness of our modeling.
UR - http://www.scopus.com/inward/record.url?scp=84986313829&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84986313829&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2016.336
DO - 10.1109/CVPR.2016.336
M3 - Conference contribution
AN - SCOPUS:84986313829
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 3083
EP - 3092
BT - Proceedings - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
T2 - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
Y2 - 26 June 2016 through 1 July 2016
ER -