TY - GEN
T1 - Are actor and action semantics retained in video supervoxel segmentation?
AU - Xu, Chenliang
AU - Doell, Richard F.
AU - Hanson, Stephen Jose
AU - Hanson, Catherine
AU - Corso, Jason J.
PY - 2013
Y1 - 2013
N2 - Existing methods in the semantic computer vision community seem unable to deal with the explosion and richness of modern, open-source and social video content. Although sophisticated methods such as object detection or bag-of-words models have been well studied, they typically operate on low level features and ultimately suffer from either scalability issues or a lack of semantic meaning. On the other hand, video supervoxel segmentation has recently been established and applied to large scale data processing, which potentially serves as an intermediate representation to high level video semantic extraction. The supervoxels are rich decompositions of the video content: they capture object shape and motion well. However, it is not yet known if the supervoxel segmentation retains the semantics of the underlying video content. In this paper, we conduct a systematic study of how well the action and actor semantics are retained in video supervoxel segmentation. Our study has human observers watching supervoxel segmentation videos and trying to discriminate both actor (human or animal) and action (one of eight everyday actions). We gather and analyze a large set of 640 human perceptions over 96 videos in 3 different supervoxel scales. Our ultimate findings suggest that a significant amount of semantics have been well retained in the video supervoxel segmentation.
AB - Existing methods in the semantic computer vision community seem unable to deal with the explosion and richness of modern, open-source and social video content. Although sophisticated methods such as object detection or bag-of-words models have been well studied, they typically operate on low level features and ultimately suffer from either scalability issues or a lack of semantic meaning. On the other hand, video supervoxel segmentation has recently been established and applied to large scale data processing, which potentially serves as an intermediate representation to high level video semantic extraction. The supervoxels are rich decompositions of the video content: they capture object shape and motion well. However, it is not yet known if the supervoxel segmentation retains the semantics of the underlying video content. In this paper, we conduct a systematic study of how well the action and actor semantics are retained in video supervoxel segmentation. Our study has human observers watching supervoxel segmentation videos and trying to discriminate both actor (human or animal) and action (one of eight everyday actions). We gather and analyze a large set of 640 human perceptions over 96 videos in 3 different supervoxel scales. Our ultimate findings suggest that a significant amount of semantics have been well retained in the video supervoxel segmentation.
UR - http://www.scopus.com/inward/record.url?scp=84893931720&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893931720&partnerID=8YFLogxK
U2 - 10.1109/ICSC.2013.56
DO - 10.1109/ICSC.2013.56
M3 - Conference contribution
AN - SCOPUS:84893931720
SN - 9780769551197
T3 - Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013
SP - 286
EP - 293
BT - Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013
T2 - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013
Y2 - 16 September 2013 through 18 September 2013
ER -