Translating related words to videos and back through latent topics

Pradipto Das, Rohini K. Srihari, Jason J. Corso

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

Documents containing video and text are becoming more and more widespread and yet content analysis of those documents depends primarily on the text. Although automated discovery of semantically related words from text improves free text query understanding, translating videos into text summaries facilitates better video search particularly in the absence of accompanying text. In this paper, we propose a multimedia topic modeling framework suitable for providing a basis for automatically discovering and translating semantically related words obtained from textual metadata of multimedia documents to semantically related videos or frames from videos. The framework jointly models video and text and is flexible enough to handle different types of document features in their constituent domains such as discrete and real valued features from videos representing actions, objects, colors and scenes as well as discrete features from text. Our proposed models show much better fit to the multimedia data in terms of held-out data log likelihoods. For a given query video, our models translate low level vision features into bag of keyword summaries which can be further translated using simple natural language generation techniques into human readable paragraphs. We quantitatively compare the results of video to bag of words translation against a state-of-the-art baseline object recognition model from computer vision. We show that text translations from multimodal topic models vastly outperform the baseline on a multimedia dataset downloaded from the Internet.

Original languageEnglish
Title of host publicationWSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining
Pages485-494
Number of pages10
DOIs
StatePublished - 2013
Event6th ACM International Conference on Web Search and Data Mining, WSDM 2013 - Rome, Italy
Duration: 4 Feb 20138 Feb 2013

Publication series

NameWSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining

Conference

Conference6th ACM International Conference on Web Search and Data Mining, WSDM 2013
Country/TerritoryItaly
CityRome
Period4/02/138/02/13

Keywords

  • multimedia topic models
  • video to text summarization

Fingerprint

Dive into the research topics of 'Translating related words to videos and back through latent topics'. Together they form a unique fingerprint.

Cite this