Counter-adversarial training data distribution validation

Research output: Contribution to journalArticlepeer-review

Abstract

Many machine learning algorithms lack a procedure for testing/validating the integrity of training data so that attacks on training data remain a simple but yet effective way to derail those algorithms—they can result in devastating consequences, particularly in security-sensitive systems. The difficulty in designing an effective testing procedure is that the distribution of the training data may change naturally with time. Thus, the question is how to distinguish a natural change in the distribution and a change due to adversary’s attack. This work offers a partial answer with a modification of the existing conditional generative adversarial networks (CGAN) framework, where a generative model aims not only to avoid detection by a discriminator but also to design a poisoning sample which will result in the largest prediction error if a classifier is trained on the data that includes the sample. Simultaneous training of the validating and generative models results in a procedure that detects corrupted data which is nearly as optimal as the poisoning sample. The modified CGAN framework is then specialized for support vector machines (SVMs) and general classification models.

Original languageEnglish
JournalOptimization Letters
DOIs
StateAccepted/In press - 2025

Keywords

  • Adversarial ML
  • GAN
  • Machine learning

Fingerprint

Dive into the research topics of 'Counter-adversarial training data distribution validation'. Together they form a unique fingerprint.

Cite this