LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing

Chenliang Xu, Jason J. Corso

Research output: Contribution to journalArticlepeer-review

39 Scopus citations

Abstract

Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis. However, there are many plausible supervoxel methods and little understanding as to when and where each is most appropriate. Indeed, we are not aware of a single comparative study on supervoxel segmentation. To that end, we study seven supervoxel algorithms, including both off-line and streaming methods, in the context of what we consider to be a good supervoxel: namely, spatiotemporal uniformity, object/region boundary detection, region compression and parsimony. For the evaluation we propose a comprehensive suite of seven quality metrics to measure these desirable supervoxel characteristics. In addition, we evaluate the methods in a supervoxel classification task as a proxy for subsequent high-level uses of the supervoxels in video analysis. We use six existing benchmark video datasets with a variety of content-types and dense human annotations. Our findings have led us to conclusive evidence that the hierarchical graph-based (GBH), segmentation by weighted aggregation (SWA) and temporal superpixels (TSP) methods are the top-performers among the seven methods. They all perform well in terms of segmentation accuracy, but vary in regard to the other desiderata: GBH captures object boundaries best; SWA has the best potential for region compression; and TSP achieves the best undersegmentation error.

Original languageEnglish
Pages (from-to)272-290
Number of pages19
JournalInternational Journal of Computer Vision
Volume119
Issue number3
DOIs
StatePublished - 1 Sep 2016

Keywords

  • Segmentation and grouping
  • Spatiotemporal processing
  • Supervoxels
  • Video segmentation

Fingerprint

Dive into the research topics of 'LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing'. Together they form a unique fingerprint.

Cite this