This list contains published white-box defenses to adversarial examples that have been open-sourced, along with third-party analyses / security evaluations that have been open-sourced.

Submit a new defense or analysis.

Defense Venue Dataset Threat Model Natural Accuracy Claims Analyses
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks (Papernot et al.) (code) S&P 2016 MNIST $$\ell_0 (\epsilon = 112)$$

99.51% accuracy

0.45% adversary success rate in changing classifier’s prediction

Deflecting Adversarial Attacks with Pixel Deflection (Prakash et al.) (code) CVPR 2018 ImageNet $$\ell_2 (\epsilon = 0.05)$$

98.9% accuracy (on images originally classified correctly by underlying model)

81% accuracy (on images originally classified correctly)

Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser (Liao et al.) (code) CVPR 2018 ImageNet $$\ell_\infty (\epsilon = 4/255)$$

75% accuracy

75% accuracy

Towards Deep Learning Models Resistant to Adversarial Attacks (Madry et al.) (code) ICLR 2018 CIFAR‑10 $$\ell_\infty (\epsilon = 8/255)$$

87% accuracy

46% accuracy

Provable defenses against adversarial examples via the convex outer adversarial polytope (Wong & Kolter) (code) ICML 2018 MNIST $$\ell_\infty (\epsilon = 0.1)$$

98.2% accuracy

94.2% accuracy

Mitigating Adversarial Effects Through Randomization (Xie et al.) (code) ICLR 2018 ImageNet $$\ell_\infty (\epsilon = 10/255)$$

99.2% accuracy (on images originally classified correctly by underlying model)

86% accuracy (on images originally classified correctly)

Thermometer Encoding: One Hot Way To Resist Adversarial Examples (Buckman et al.) (code) ICLR 2018 CIFAR‑10 $$\ell_\infty (\epsilon = 8/255)$$

90% accuracy

79% accuracy

Countering Adversarial Images using Input Transformations (Guo et al.) (code) ICLR 2018 ImageNet $$\ell_2 (\epsilon = 0.06)$$

75% accuracy

70% accuracy on ImageNet with average normalized perturbation of 0.06

Stochastic Activation Pruning for Robust Adversarial Defense (Dhillon et al.) (code) ICLR 2018 CIFAR‑10 $$\ell_\infty (\epsilon = 4/255)$$

83% accuracy

51% accuracy

PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples (Song et al.) (code) ICLR 2018 CIFAR‑10 $$\ell_\infty (\epsilon = 8/255)$$

90% accuracy

70% accuracy

Towards the first adversarially robust neural network model on MNIST (Schott et al.) (code) NeurIPS SECML 2018 MNIST $$\ell_2 (\epsilon = 1.5)$$

99% accuracy

80% accuracy

Provable Robustness of ReLU networks via Maximization of Linear Regions (Croce et al.) (code) AISTATS 2019 MNIST $$\ell_\infty (\epsilon = 0.1)$$

98.81% accuracy

96.42% accuracy (empirical), 96.37% accuracy (certified)

Provable Robustness of ReLU networks via Maximization of Linear Regions (Croce et al.) (code) AISTATS 2019 FMNIST $$\ell_\infty (\epsilon = 0.1)$$

85.50% accuracy

(on first 1000 test points) 73.4% accuracy (empirical), 69.3% accuracy (certified)

Provable Robustness of ReLU networks via Maximization of Linear Regions (Croce et al.) (code) AISTATS 2019 GTS $$\ell_2 (\epsilon = 0.2)$$

84.65% accuracy

(on first 1000 test points) 67.9% accuracy (empirical), 66.8% accuracy (certified)

Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models (Sinha et al.) (code) IJCAI 2019 CIFAR‑10 $$\ell_\infty (\epsilon = 0.03)$$

87.8% accuracy

53.82% accuracy