Robust Object Detection Under Adversarial Patch Attacks in Vision-Based Navigation

Research output: Contribution to journalArticlepeer-review

Abstract

In vision-guided autonomous robots, object detectors play a crucial role in perceiving the environment for path planning and decision-making. However, adaptive adversarial patch attacks undermine the resilience of detector-based systems. Strengthening object detectors against such adaptive attacks enhances the robustness of navigation systems. Existing defenses against patch attacks are primarily designed for stationary scenes and struggle against adaptive patch attacks that vary in scale, position, and orientation in dynamic environments. In this paper, we introduce Ad_YOLO+, an efficient and effective plugin that extends Ad_YOLO to defend against white-box patch-based image attacks. Built on YOLOv5x with an additional patch detection layer, Ad_YOLO+ is trained on a specially crafted adversarial dataset (COCO-Visdrone-2019). Unlike conventional methods that rely on redundant image preprocessing, our approach directly detects adversarial patches and the overlaid objects. Experiments on the adversarial training dataset demonstrate that Ad_YOLO+ improves both provable robustness and clean accuracy. Ad_YOLO+ achieves (Formula presented.) top-1 clean accuracy on the COCO dataset and (Formula presented.) top-1 robust provable accuracy against pixel square patches anywhere on the image for the COCO-VisDrone-2019 dataset. Moreover, under adaptive attacks in AirSim simulations, Ad_YOLO+ reduces the attack success rate, ensuring tracking resilience in both dynamic and static settings. Additionally, it generalizes well to other patch detection weight configurations.

Original languageEnglish
Article number44
JournalAutomation
Volume6
Issue number3
DOIs
StatePublished - Sep 2025

Keywords

  • adversarial robustness
  • object detection model
  • patch-enabled image attack
  • vision-based object tracking

Fingerprint

Dive into the research topics of 'Robust Object Detection Under Adversarial Patch Attacks in Vision-Based Navigation'. Together they form a unique fingerprint.

Cite this