Can humans fly? Action understanding with multiple classes of actors

Chenliang Xu, Shao Hang Hsieh, Caiming Xiong, Jason J. Corso

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

86 Scopus citations

Abstract

Can humans fly? Emphatically no. Can cars eat? Again, absolutely not. Yet, these absurd inferences result from the current disregard for particular types of actors in action understanding. There is no work we know of on simultaneously inferring actors and actions in the video, not to mention a dataset to experiment with. Our paper hence marks the first effort in the computer vision community to jointly consider various types of actors undergoing various actions. To start with the problem, we collect a dataset of 3782 videos from YouTube and label both pixel-level actors and actions in each video. We formulate the general actor-action understanding problem and instantiate it at various granularities: both video-level single- and multiple-label actor-action recognition and pixel-level actor-action semantic segmentation. Our experiments demonstrate that inference jointly over actors and actions outperforms inference independently over them, and hence concludes our argument of the value of explicit consideration of various actors in comprehensive action understanding.

Original languageEnglish
Title of host publicationIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
Pages2264-2273
Number of pages10
ISBN (Electronic)9781467369640
DOIs
StatePublished - 14 Oct 2015
EventIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 - Boston, United States
Duration: 7 Jun 201512 Jun 2015

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume07-12-June-2015
ISSN (Print)1063-6919

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
Country/TerritoryUnited States
CityBoston
Period7/06/1512/06/15

Fingerprint

Dive into the research topics of 'Can humans fly? Action understanding with multiple classes of actors'. Together they form a unique fingerprint.

Cite this