Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph

Yawen Zeng, Yiru Wang, Dongliang Liao, Gongfu Li, Weijie Huang, Jin Xu, Da Cao, Hong Man

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

— The task of cross-modal image retrieval has recently attracted considerable research attention. In real-world scenarios, keyword-based queries issued by users are usually short and have broad semantics. Therefore, semantic diversity is as important as retrieval accuracy in such user-oriented services, which improves user experience. However, most typical cross-modal image retrieval methods based on single point query embedding inevitably result in low semantic diversity, while existing diverse retrieval approaches frequently lead to low accuracy due to a lack of cross-modal understanding. To address this challenge, we introduce an end-to-end solution termed variational multiple instance graph (VMIG), in which a continuous semantic space is learned to capture diverse query semantics, and the retrieval task is formulated as a multiple instance learning problems to connect diverse features across modalities. Specifically, a query-guided variational autoencoder is employed to model the continuous semantic space instead of learning a single-point embedding. Afterward, multiple instances of the image and query are obtained by sampling in the continuous semantic space and applying multihead attention, respectively. Thereafter, an instance graph is constructed to remove noisy instances and align cross-modal semantics. Finally, heterogeneous modalities are robustly fused under multiple losses. Extensive experiments on two real-world datasets have well verified the effectiveness of our proposed solution in both retrieval accuracy and semantic diversity.

Original languageEnglish
Pages (from-to)10528-10537
Number of pages10
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume34
Issue number12
DOIs
StatePublished - 1 Dec 2023

Keywords

  • Cross-modal retrieval
  • keyword-based image retrieval
  • multiple instance graph
  • variational autoencoder (VAE)

Fingerprint

Dive into the research topics of 'Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph'. Together they form a unique fingerprint.

Cite this