GENIE TRECVID2011 multimedia event detection: Late-fusion approaches to combine multiple audio-visual features

Amitha A.G. Perera, Sangmin Oh, Matthew Leotta, Ilseo Kim, Byuungki Byun, Chin Hui Lee, Scott McCloskey, Jingchen Liu, Ben Miller, Zhi Feng Huang, Arash Vahdat, Weilong Yang, Greg Mori, Kevin Tang, Daphne Koller, L. Fei-Fei, Kang Li, Gang Chen, Jason Corso, Yun FuRohini Srihari

Research output: Contribution to conferencePaperpeer-review

8 Scopus citations

Abstract

For TRECVID 2011 MED task, the GENIE system incorporated two late-fusion approaches where multiple discriminative base-classifiers are built per feature, then, combined later through discriminative fusion techniques. All of our fusion and base classifiers are formulated as one-vs-all detectors per event class along with threshold estimation capabilities during cross-validation. Total of five different types of features were extracted from data, which include both audio or visual features: HOG3D, Object Bank, Gist, MFCC, and acoustic segment models (ASMs). Features such as HOG3D and MFCC are low-level features while Object Bank and ASMs are more semantic. In our work, event-specific feature adaptations or manual annotations were deliberately avoided, to establish a strong baseline results. Overall, the results were competitive in the MED11 evaluation, and shows that standard machine learning techniques can yield fairly good results even on a challenging dataset.

Original languageEnglish
StatePublished - 2011
EventTREC Video Retrieval Evaluation, TRECVID 2011 - Gaithersburg, MD, United States
Duration: 5 Dec 20117 Dec 2011

Conference

ConferenceTREC Video Retrieval Evaluation, TRECVID 2011
Country/TerritoryUnited States
CityGaithersburg, MD
Period5/12/117/12/11

Fingerprint

Dive into the research topics of 'GENIE TRECVID2011 multimedia event detection: Late-fusion approaches to combine multiple audio-visual features'. Together they form a unique fingerprint.

Cite this