TY - GEN
T1 - Action bank
T2 - 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
AU - Sadanand, Sreemanananth
AU - Corso, Jason J.
PY - 2012
Y1 - 2012
N2 - Activity recognition in video is dominated by low- and mid-level features, and while demonstrably capable, by nature, these features carry little semantic meaning. Inspired by the recent object bank approach to image representation, we present Action Bank, a new high-level representation of video. Action bank is comprised of many individual action detectors sampled broadly in semantic space as well as viewpoint space. Our representation is constructed to be semantically rich and even when paired with simple linear SVM classifiers is capable of highly discriminative performance. We have tested action bank on four major activity recognition benchmarks. In all cases, our performance is better than the state of the art, namely 98.2% on KTH (better by 3.3%), 95.0% on UCF Sports (better by 3.7%), 57.9% on UCF50 (baseline is 47.9%), and 26.9% on HMDB51 (baseline is 23.2%). Furthermore, when we analyze the classifiers, we find strong transfer of semantics from the constituent action detectors to the bank classifier.
AB - Activity recognition in video is dominated by low- and mid-level features, and while demonstrably capable, by nature, these features carry little semantic meaning. Inspired by the recent object bank approach to image representation, we present Action Bank, a new high-level representation of video. Action bank is comprised of many individual action detectors sampled broadly in semantic space as well as viewpoint space. Our representation is constructed to be semantically rich and even when paired with simple linear SVM classifiers is capable of highly discriminative performance. We have tested action bank on four major activity recognition benchmarks. In all cases, our performance is better than the state of the art, namely 98.2% on KTH (better by 3.3%), 95.0% on UCF Sports (better by 3.7%), 57.9% on UCF50 (baseline is 47.9%), and 26.9% on HMDB51 (baseline is 23.2%). Furthermore, when we analyze the classifiers, we find strong transfer of semantics from the constituent action detectors to the bank classifier.
UR - http://www.scopus.com/inward/record.url?scp=84866718894&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866718894&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2012.6247806
DO - 10.1109/CVPR.2012.6247806
M3 - Conference contribution
AN - SCOPUS:84866718894
SN - 9781467312264
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 1234
EP - 1241
BT - 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
Y2 - 16 June 2012 through 21 June 2012
ER -