GENIE TRECVID2011 multimedia event detection: Late-fusion approaches to combine multiple audio-visual features

  • Amitha A.G. Perera
  • , Sangmin Oh
  • , Matthew Leotta
  • , Ilseo Kim
  • , Byuungki Byun
  • , Chin Hui Lee
  • , Scott McCloskey
  • , Jingchen Liu
  • , Ben Miller
  • , Zhi Feng Huang
  • , Arash Vahdat
  • , Weilong Yang
  • , Greg Mori
  • , Kevin Tang
  • , Daphne Koller
  • , L. Fei-Fei
  • , Kang Li
  • , Gang Chen
  • , Jason Corso
  • , Yun Fu
  • Rohini Srihari

Research output: Contribution to conferencePaperpeer-review

8 Scopus citations

Abstract

For TRECVID 2011 MED task, the GENIE system incorporated two late-fusion approaches where multiple discriminative base-classifiers are built per feature, then, combined later through discriminative fusion techniques. All of our fusion and base classifiers are formulated as one-vs-all detectors per event class along with threshold estimation capabilities during cross-validation. Total of five different types of features were extracted from data, which include both audio or visual features: HOG3D, Object Bank, Gist, MFCC, and acoustic segment models (ASMs). Features such as HOG3D and MFCC are low-level features while Object Bank and ASMs are more semantic. In our work, event-specific feature adaptations or manual annotations were deliberately avoided, to establish a strong baseline results. Overall, the results were competitive in the MED11 evaluation, and shows that standard machine learning techniques can yield fairly good results even on a challenging dataset.

Original languageEnglish
StatePublished - 2011
EventTREC Video Retrieval Evaluation, TRECVID 2011 - Gaithersburg, MD, United States
Duration: 5 Dec 20117 Dec 2011

Conference

ConferenceTREC Video Retrieval Evaluation, TRECVID 2011
Country/TerritoryUnited States
CityGaithersburg, MD
Period5/12/117/12/11

Fingerprint

Dive into the research topics of 'GENIE TRECVID2011 multimedia event detection: Late-fusion approaches to combine multiple audio-visual features'. Together they form a unique fingerprint.

Cite this