TY - GEN
T1 - Temporally consistent multi-class video-object segmentation with the video graph-shifts algorithm
AU - Chen, Albert Y.C.
AU - Corso, Jason J.
PY - 2011
Y1 - 2011
N2 - We present the Video Graph-Shifts (VGS) approach for efficiently incorporating temporal consistency into MRF energy minimization for multi-class video object segmentation. In contrast to previous methods, our dynamic temporal links avoid the computational overhead of using a fully connected spatiotemporal MRF, while still being able to deal with the uncertainties of the exact inter-frame pixel correspondence issues. The dynamic temporal links are initialized flexibly for balancing between speed and accuracy, and are automatically revised whenever a label change (shift) occurs during the energy minimization process. We show in the benchmark CamVid database and our own wintry driving dataset that VGS improves the issue of temporally inconsistent segmentation effectively - enhancements of up to 5% to 10% for those semantic classes with high intra-class variance. Furthermore, VGS processes each frame at pixel resolution in about one second, which provides a practical way of modeling complex probabilistic relationships in videos and solving it in near real-time.
AB - We present the Video Graph-Shifts (VGS) approach for efficiently incorporating temporal consistency into MRF energy minimization for multi-class video object segmentation. In contrast to previous methods, our dynamic temporal links avoid the computational overhead of using a fully connected spatiotemporal MRF, while still being able to deal with the uncertainties of the exact inter-frame pixel correspondence issues. The dynamic temporal links are initialized flexibly for balancing between speed and accuracy, and are automatically revised whenever a label change (shift) occurs during the energy minimization process. We show in the benchmark CamVid database and our own wintry driving dataset that VGS improves the issue of temporally inconsistent segmentation effectively - enhancements of up to 5% to 10% for those semantic classes with high intra-class variance. Furthermore, VGS processes each frame at pixel resolution in about one second, which provides a practical way of modeling complex probabilistic relationships in videos and solving it in near real-time.
UR - http://www.scopus.com/inward/record.url?scp=79952531448&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952531448&partnerID=8YFLogxK
U2 - 10.1109/WACV.2011.5711561
DO - 10.1109/WACV.2011.5711561
M3 - Conference contribution
AN - SCOPUS:79952531448
SN - 9781424494965
T3 - 2011 IEEE Workshop on Applications of Computer Vision, WACV 2011
SP - 614
EP - 621
BT - 2011 IEEE Workshop on Applications of Computer Vision, WACV 2011
T2 - 2011 IEEE Workshop on Applications of Computer Vision, WACV 2011
Y2 - 5 January 2011 through 7 January 2011
ER -