Bubblenets: Learning to select the guidance frame in video object segmentation by deep sorting frames

Brent A. Griffin, Jason J. Corso

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

38 Scopus citations

Abstract

Semi-supervised video object segmentation has made significant progress on real and challenging videos in recent years. The current paradigm for segmentation methods and benchmark datasets is to segment objects in video provided a single annotation in the first frame. However, we find that segmentation performance across the entire video varies dramatically when selecting an alternative frame for annotation. This paper addresses the problem of learning to suggest the single best frame across the video for user annotation-this is, in fact, never the first frame of video. We achieve this by introducing BubbleNets, a novel deep sorting network that learns to select frames using a performance-based loss function that enables the conversion of expansive amounts of training examples from already existing datasets. Using BubbleNets, we are able to achieve an 11% relative improvement in segmentation performance on the DAVIS benchmark without any changes to the underlying method of segmentation.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Pages8906-8915
Number of pages10
ISBN (Electronic)9781728132938
DOIs
StatePublished - Jun 2019
Event32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States
Duration: 16 Jun 201920 Jun 2019

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2019-June
ISSN (Print)1063-6919

Conference

Conference32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Country/TerritoryUnited States
CityLong Beach
Period16/06/1920/06/19

Keywords

  • Datasets and Evaluation
  • Deep Learning
  • Grouping and Shape
  • Motion and Tracking
  • Segmentation

Fingerprint

Dive into the research topics of 'Bubblenets: Learning to select the guidance frame in video object segmentation by deep sorting frames'. Together they form a unique fingerprint.

Cite this