TY - JOUR
T1 - Adversarial and counter-adversarial support vector machines
AU - Indyk, Ihor
AU - Zabarankin, Michael
N1 - Publisher Copyright:
© 2019
PY - 2019/9/3
Y1 - 2019/9/3
N2 - A support vector machine (SVM)is a simple but yet powerful classification technique widely used in various applications, such as handwritten digits classification and face recognition. However, as any linear classification algorithm, it is vulnerable to adversarial attacks on test/training data. The majority of machine learning attacks aim to corrupt online data to affect automated classification in a number of business applications. This work formulates a multistage game between an SVM and adversary. The SVM aims to maximize classification accuracy on a test dataset and includes a procedure for validating a training dataset, whereas the adversary aims to minimize the accuracy on the test dataset by perturbing the training dataset. At each stage, a new training data batch arrives and the SVM's validation procedure yields a training set that includes only those batches whose distributions are sufficiently close to that of a validation dataset, which is not known to the adversary. The procedure uses a modified multivariate Cramér test. However, the SVM has access to the test dataset only in the very end, whereas the adversary knows it from the very beginning. SVM's strategy is a sequence of thresholds used in the validation procedure, whereas adversary's strategy is a sequence of upper bounds on perturbation norm. The adversary's optimization problem at each stage is non-convex and two ways for solving it approximately are suggested. The proposed game is applied in malware detection. Among several considered SVM's strategies, the one that increases the threshold when SVM performance on the validation set improves is the most efficient one.
AB - A support vector machine (SVM)is a simple but yet powerful classification technique widely used in various applications, such as handwritten digits classification and face recognition. However, as any linear classification algorithm, it is vulnerable to adversarial attacks on test/training data. The majority of machine learning attacks aim to corrupt online data to affect automated classification in a number of business applications. This work formulates a multistage game between an SVM and adversary. The SVM aims to maximize classification accuracy on a test dataset and includes a procedure for validating a training dataset, whereas the adversary aims to minimize the accuracy on the test dataset by perturbing the training dataset. At each stage, a new training data batch arrives and the SVM's validation procedure yields a training set that includes only those batches whose distributions are sufficiently close to that of a validation dataset, which is not known to the adversary. The procedure uses a modified multivariate Cramér test. However, the SVM has access to the test dataset only in the very end, whereas the adversary knows it from the very beginning. SVM's strategy is a sequence of thresholds used in the validation procedure, whereas adversary's strategy is a sequence of upper bounds on perturbation norm. The adversary's optimization problem at each stage is non-convex and two ways for solving it approximately are suggested. The proposed game is applied in malware detection. Among several considered SVM's strategies, the one that increases the threshold when SVM performance on the validation set improves is the most efficient one.
KW - Adversary
KW - Counter-adversarial strategy
KW - Game theory
KW - Malware detection
KW - Optimization
KW - Support vector machine (SVM)
UR - http://www.scopus.com/inward/record.url?scp=85065609925&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065609925&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2019.04.035
DO - 10.1016/j.neucom.2019.04.035
M3 - Article
AN - SCOPUS:85065609925
SN - 0925-2312
VL - 356
SP - 1
EP - 8
JO - Neurocomputing
JF - Neurocomputing
ER -