TY - GEN
T1 - TASED-net
T2 - 17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
AU - Min, Kyle
AU - Corso, Jason
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: First, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single prediction map is produced from an input clip of multiple frames. Frame-wise saliency maps can be predicted by applying TASED-Net in a sliding-window fashion to a video. The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. After analyzing the results qualitatively, we observe that our model is especially better at attending to salient moving objects.
AB - TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: First, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single prediction map is produced from an input clip of multiple frames. Frame-wise saliency maps can be predicted by applying TASED-Net in a sliding-window fashion to a video. The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. After analyzing the results qualitatively, we observe that our model is especially better at attending to salient moving objects.
UR - http://www.scopus.com/inward/record.url?scp=85079183981&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079183981&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2019.00248
DO - 10.1109/ICCV.2019.00248
M3 - Conference contribution
AN - SCOPUS:85079183981
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 2394
EP - 2403
BT - Proceedings - 2019 International Conference on Computer Vision, ICCV 2019
Y2 - 27 October 2019 through 2 November 2019
ER -