TY - GEN
T1 - Can humans fly? Action understanding with multiple classes of actors
AU - Xu, Chenliang
AU - Hsieh, Shao Hang
AU - Xiong, Caiming
AU - Corso, Jason J.
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/10/14
Y1 - 2015/10/14
N2 - Can humans fly? Emphatically no. Can cars eat? Again, absolutely not. Yet, these absurd inferences result from the current disregard for particular types of actors in action understanding. There is no work we know of on simultaneously inferring actors and actions in the video, not to mention a dataset to experiment with. Our paper hence marks the first effort in the computer vision community to jointly consider various types of actors undergoing various actions. To start with the problem, we collect a dataset of 3782 videos from YouTube and label both pixel-level actors and actions in each video. We formulate the general actor-action understanding problem and instantiate it at various granularities: both video-level single- and multiple-label actor-action recognition and pixel-level actor-action semantic segmentation. Our experiments demonstrate that inference jointly over actors and actions outperforms inference independently over them, and hence concludes our argument of the value of explicit consideration of various actors in comprehensive action understanding.
AB - Can humans fly? Emphatically no. Can cars eat? Again, absolutely not. Yet, these absurd inferences result from the current disregard for particular types of actors in action understanding. There is no work we know of on simultaneously inferring actors and actions in the video, not to mention a dataset to experiment with. Our paper hence marks the first effort in the computer vision community to jointly consider various types of actors undergoing various actions. To start with the problem, we collect a dataset of 3782 videos from YouTube and label both pixel-level actors and actions in each video. We formulate the general actor-action understanding problem and instantiate it at various granularities: both video-level single- and multiple-label actor-action recognition and pixel-level actor-action semantic segmentation. Our experiments demonstrate that inference jointly over actors and actions outperforms inference independently over them, and hence concludes our argument of the value of explicit consideration of various actors in comprehensive action understanding.
UR - http://www.scopus.com/inward/record.url?scp=84959236591&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959236591&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2015.7298839
DO - 10.1109/CVPR.2015.7298839
M3 - Conference contribution
AN - SCOPUS:84959236591
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 2264
EP - 2273
BT - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
T2 - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
Y2 - 7 June 2015 through 12 June 2015
ER -