TASED-net: Temporally-aggregating spatial encoder-decoder network for video saliency detection

Kyle Min, Jason Corso

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

143 Scopus citations

Abstract

TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: First, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single prediction map is produced from an input clip of multiple frames. Frame-wise saliency maps can be predicted by applying TASED-Net in a sliding-window fashion to a video. The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. After analyzing the results qualitatively, we observe that our model is especially better at attending to salient moving objects.

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision, ICCV 2019
Pages2394-2403
Number of pages10
ISBN (Electronic)9781728148038
DOIs
StatePublished - Oct 2019
Event17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Korea, Republic of
Duration: 27 Oct 20192 Nov 2019

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2019-October
ISSN (Print)1550-5499

Conference

Conference17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period27/10/192/11/19

Fingerprint

Dive into the research topics of 'TASED-net: Temporally-aggregating spatial encoder-decoder network for video saliency detection'. Together they form a unique fingerprint.

Cite this