This list contains published white-box defenses to adversarial examples that have been open-sourced, along with third-party analyses / security evaluations that have been open-sourced.

Submit a new defense or analysis.

Defense Venue Dataset Threat Model Natural Accuracy Claims Analyses
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks (Papernot et al.) (code) S&P 2016 MNIST $$\ell_0 (\epsilon = 112)$$

99.51% accuracy

0.45% adversary success rate in changing classifier’s prediction

Deflecting Adversarial Attacks with Pixel Deflection (Prakash et al.) (code) CVPR 2018 ImageNet $$\ell_2 (\epsilon = 0.05)$$

98.9% accuracy (on images originally classified correctly by underlying model)

81% accuracy (on images originally classified correctly)

Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser (Liao et al.) (code) CVPR 2018 ImageNet $$\ell_\infty (\epsilon = 4/255)$$

75% accuracy

75% accuracy

Towards Deep Learning Models Resistant to Adversarial Attacks (Madry et al.) (code) ICLR 2018 CIFAR-10 $$\ell_\infty (\epsilon = 8/255)$$

87% accuracy

46% accuracy

Provable defenses against adversarial examples via the convex outer adversarial polytope (Wong & Kolter) (code) ICML 2018 MNIST $$\ell_\infty (\epsilon = 0.1)$$

98.2% accuracy

94.2% accuracy

Mitigating Adversarial Effects Through Randomization (Xie et al.) (code) ICLR 2018 ImageNet $$\ell_\infty (\epsilon = 10/255)$$

99.2% accuracy (on images originally classified correctly by underlying model)

86% accuracy (on images originally classified correctly)

Thermometer Encoding: One Hot Way To Resist Adversarial Examples (Buckman et al.) (code) ICLR 2018 CIFAR-10 $$\ell_\infty (\epsilon = 8/255)$$

90% accuracy

79% accuracy

Countering Adversarial Images using Input Transformations (Guo et al.) (code) ICLR 2018 ImageNet $$\ell_2 (\epsilon = 0.06)$$

75% accuracy

70% accuracy on ImageNet with average normalized perturbation of 0.06

Stochastic Activation Pruning for Robust Adversarial Defense (Dhillon et al.) (code) ICLR 2018 CIFAR-10 $$\ell_\infty (\epsilon = 4/255)$$

83% accuracy

51% accuracy

PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples (Song et al.) (code) ICLR 2018 CIFAR-10 $$\ell_\infty (\epsilon = 8/255)$$

90% accuracy

70% accuracy