TY - JOUR
T1 - A study of actor and action semantic retention in video supervoxel segmentation
AU - Xu, Chenliang
AU - Doell, Richard F.
AU - Hanson, Stephen José
AU - Hanson, Catherine
AU - Corso, Jason J.
N1 - Publisher Copyright:
© 2013 World Scientific Publishing Company.
PY - 2013/12/1
Y1 - 2013/12/1
N2 - Existing methods in the semantic computer vision community seem unable to deal with the explosion and richness of modern, open-source and social video content. Although sophisticated methods such as object detection or bag-of-words models have been well studied, they typically operate on low level features and ultimately suffer from either scalability issues or a lack of semantic meaning. On the other hand, video supervoxel segmentation has recently been established and applied to large scale data processing, which potentially serves as an intermediate representation to high level video semantic extraction. The supervoxels are rich decompositions of the video content: they capture object shape and motion well. However, it is not yet known if the supervoxel segmentation retains the semantics of the underlying video content. In this paper, we conduct a systematic study of how well the actor and action semantics are retained in video supervoxel segmentation. Our study has human observers watching supervoxel segmentation videos and trying to discriminate both actor (human or animal) and action (one of eight everyday actions). We gather and analyze a large set of 640 human perceptions over 96 videos in 3 different supervoxel scales. Furthermore, we design a feature defined on supervoxel segmentation, called supervoxel shape context, which is inspired by the higher order processes in human perception. We conduct actor and action classification experiments with this new feature and compare to various traditional video features. Our ultimate findings suggest that a significant amount of semantics have been well retained in the video supervoxel segmentation and can be used for further video analysis.
AB - Existing methods in the semantic computer vision community seem unable to deal with the explosion and richness of modern, open-source and social video content. Although sophisticated methods such as object detection or bag-of-words models have been well studied, they typically operate on low level features and ultimately suffer from either scalability issues or a lack of semantic meaning. On the other hand, video supervoxel segmentation has recently been established and applied to large scale data processing, which potentially serves as an intermediate representation to high level video semantic extraction. The supervoxels are rich decompositions of the video content: they capture object shape and motion well. However, it is not yet known if the supervoxel segmentation retains the semantics of the underlying video content. In this paper, we conduct a systematic study of how well the actor and action semantics are retained in video supervoxel segmentation. Our study has human observers watching supervoxel segmentation videos and trying to discriminate both actor (human or animal) and action (one of eight everyday actions). We gather and analyze a large set of 640 human perceptions over 96 videos in 3 different supervoxel scales. Furthermore, we design a feature defined on supervoxel segmentation, called supervoxel shape context, which is inspired by the higher order processes in human perception. We conduct actor and action classification experiments with this new feature and compare to various traditional video features. Our ultimate findings suggest that a significant amount of semantics have been well retained in the video supervoxel segmentation and can be used for further video analysis.
KW - Semantic retention
KW - action recognition
KW - computer vision
KW - video supervoxel segmentation
UR - http://www.scopus.com/inward/record.url?scp=85073111014&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073111014&partnerID=8YFLogxK
U2 - 10.1142/S1793351X13400114
DO - 10.1142/S1793351X13400114
M3 - Article
AN - SCOPUS:85073111014
SN - 1793-351X
VL - 7
SP - 353
EP - 375
JO - International Journal of Semantic Computing
JF - International Journal of Semantic Computing
IS - 4
ER -