The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples

Minds and Machines 32 (1):1-33 (2021)
  Copy   BIBTEX

Abstract

The same method that creates adversarial examples to fool image-classifiers can be used to generate counterfactual explanations that explain algorithmic decisions. This observation has led researchers to consider CEs as AEs by another name. We argue that the relationship to the true label and the tolerance with respect to proximity are two properties that formally distinguish CEs and AEs. Based on these arguments, we introduce CEs, AEs, and related concepts mathematically in a common framework. Furthermore, we show connections between current methods for generating CEs and AEs, and estimate that the fields will merge more and more as the number of common use-cases grows.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,349

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Against Adversarial Discussion.Maarten Steenhagen - 2016 - Collingwood and British Idealism Studies 22 (1):87-112.
Explanation, invariance, and intervention.James Woodward - 1997 - Philosophy of Science 64 (4):41.
Are There Non-Causal Explanations (of Particular Events)?Brdford Skow - 2013 - British Journal for the Philosophy of Science (3):axs047.
Moral responsibility and omissions.Jeremy Byrd - 2007 - Philosophical Quarterly 57 (226):56–67.

Analytics

Added to PP
2021-10-30

Downloads
33 (#470,805)

6 months
6 (#522,885)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

References found in this work

Philosophical papers.David Kellogg Lewis - 1983 - New York: Oxford University Press.
Causality.Judea Pearl - 2000 - New York: Cambridge University Press.
Counterfactuals.David Lewis - 1973 - Foundations of Language 13 (1):145-151.

View all 15 references / Add more references