TY - GEN
T1 - A Multiversion Programming Inspired Approach to Detecting Audio Adversarial Examples
AU - Zeng, Qiang
AU - Su, Jianhai
AU - Fu, Chenglong
AU - Kayas, Golam
AU - Luo, Lannan
AU - Du, Xiaojiang
AU - Tan, Chiu C.
AU - Wu, Jie
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - Adversarial examples (AEs) are crafted by adding human-imperceptible perturbations to inputs such that a machine-learning based classifier incorrectly labels them. They have become a severe threat to the trustworthiness of machine learning. While AEs in the image domain have been well studied, audio AEs are less investigated. Recently, multiple techniques are proposed to generate audio AEs, which makes countermeasures against them urgent. Our experiments show that, given an audio AE, the transcription results by Automatic Speech Recognition (ASR) systems differ significantly (that is, poor transferability), as different ASR systems use different architectures, parameters, and training datasets. Based on this fact and inspired by Multiversion Programming, we propose a novel audio AE detection approach MVP-Ears, which utilizes the diverse off-The-shelf ASRs to determine whether an audio is an AE. We build the largest audio AE dataset to our knowledge, and the evaluation shows that the detection accuracy reaches 99.88%. While transferable audio AEs are difficult to generate at this moment, they may become a reality in future. We further adapt the idea above to proactively train the detection system for coping with transferable audio AEs. Thus, the proactive detection system is one giant step ahead of attackers working on transferable AEs.
AB - Adversarial examples (AEs) are crafted by adding human-imperceptible perturbations to inputs such that a machine-learning based classifier incorrectly labels them. They have become a severe threat to the trustworthiness of machine learning. While AEs in the image domain have been well studied, audio AEs are less investigated. Recently, multiple techniques are proposed to generate audio AEs, which makes countermeasures against them urgent. Our experiments show that, given an audio AE, the transcription results by Automatic Speech Recognition (ASR) systems differ significantly (that is, poor transferability), as different ASR systems use different architectures, parameters, and training datasets. Based on this fact and inspired by Multiversion Programming, we propose a novel audio AE detection approach MVP-Ears, which utilizes the diverse off-The-shelf ASRs to determine whether an audio is an AE. We build the largest audio AE dataset to our knowledge, and the evaluation shows that the detection accuracy reaches 99.88%. While transferable audio AEs are difficult to generate at this moment, they may become a reality in future. We further adapt the idea above to proactively train the detection system for coping with transferable audio AEs. Thus, the proactive detection system is one giant step ahead of attackers working on transferable AEs.
KW - Adversarial Example
KW - Automatic Speech Recognition
KW - DNN
KW - transferability
UR - http://www.scopus.com/inward/record.url?scp=85072125401&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072125401&partnerID=8YFLogxK
U2 - 10.1109/DSN.2019.00019
DO - 10.1109/DSN.2019.00019
M3 - Conference contribution
AN - SCOPUS:85072125401
T3 - Proceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019
SP - 39
EP - 51
BT - Proceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019
T2 - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019
Y2 - 24 June 2019 through 27 June 2019
ER -