LAVS: A LIGHTWEIGHT AUDIO-VISUAL SALIENCY PREDICTION MODEL

Dandan Zhu, Defang Zhao, Xiongkuo Min, Tian Han, Qiangqiang Zhou, Shaobo Yu, Yongqing Chen, Guangtao Zhai, Xiaokang Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Audio information is essential for guiding human attention and visual perception, which has been verified by many comprehensive psychological studies. However, the audio modality has been rather neglected in modeling visual attention, most of the current visual attention models heavily depend on visual information. Additionally, current existing high-performing visual attention models rely on deeper convolution neural networks (CNNs), benefiting from their extraordinary feature learning ability but incurring high computational cost. To this end, we propose a novel lightweight audio-visual saliency (LAVS) model to efficiently address the problem of fixation prediction in videos. To the best of our knowledge, our proposed model constitutes the first attempt to exploit a lightweight network and combines the visual and audio cues to perform saliency estimation in videos. Specifically, our proposed model consists of four modules, which are spatial-temporal visual saliency estimation module, audio features extraction module, source sound localization module, and audio-visual saliency fusion module. Extensive experiments across datasets validate the effectiveness and real-time performance of the proposed LAVS model, which outperforms the other state-of-the-art methods.

Original languageEnglish
Title of host publication2021 IEEE International Conference on Multimedia and Expo, ICME 2021
ISBN (Electronic)9781665438643
DOIs
StatePublished - 2021
Event2021 IEEE International Conference on Multimedia and Expo, ICME 2021 - Shenzhen, China
Duration: 5 Jul 20219 Jul 2021

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2021 IEEE International Conference on Multimedia and Expo, ICME 2021
Country/TerritoryChina
CityShenzhen
Period5/07/219/07/21

Keywords

  • Audio-visual saliency
  • deep canonical correlation analysis
  • lightweight model
  • saliency fusion
  • visual attention

Fingerprint

Dive into the research topics of 'LAVS: A LIGHTWEIGHT AUDIO-VISUAL SALIENCY PREDICTION MODEL'. Together they form a unique fingerprint.

Cite this