The explanation game: a formal framework for interpretable machine learning

Synthese 198 (10):1–⁠32 (2020)
  Copy   BIBTEX


We propose a formal framework for interpretable machine learning. Combining elements from statistical learning, causal interventionism, and decision theory, we design an idealised explanation game in which players collaborate to find the best explanation for a given algorithmic prediction. Through an iterative procedure of questions and answers, the players establish a three-dimensional Pareto frontier that describes the optimal trade-offs between explanatory accuracy, simplicity, and relevance. Multiple rounds are played at different levels of abstraction, allowing the players to explore overlapping causal patterns of variable granularity and scope. We characterise the conditions under which such a game is almost surely guaranteed to converge on a optimal explanation surface in polynomial time, and highlight obstacles that will tend to prevent the players from advancing beyond certain explanatory thresholds. The game serves a descriptive and a normative function, establishing a conceptual space in which to analyse and compare existing proposals, as well as design new and improved solutions.

Similar books and articles

Mindreading and Endogenous Beliefs in Games.Lauren Larrouy & Guilhem Lecouteux - 2017 - Journal of Economic Methodology 24 (3):318-343.
Undecidability in the Imitation Game.Y. Sato & T. Ikegami - 2004 - Minds and Machines 14 (2):133-43.
On the Narrow Epistemology of Game Theoretic Agents.Boudewijn de Bruin - 2009 - In Ondrej Majer, Ahti-Veikko Pietarinen & Tero Tulenheimo (eds.), Games: Unifying Logic, Language, and Philosophy. Springer.
Computing Machinery and Intelligence.Alan M. Turing - 1950 - Mind 59 (October):433-60.
Game Theoretic Pragmatics.Michael Franke - 2013 - Philosophy Compass 8 (3):269-284.


Added to PP

84 (#143,772)

6 months
13 (#66,297)

Historical graph of downloads
How can I increase my downloads?

Author Profiles

Luciano Floridi
Oxford University
David Watson
University College London

References found in this work

Thinking, Fast and Slow.Daniel Kahneman - 2011 - New York: New York: Farrar, Straus and Giroux.
Minds, Brains, and Programs.John Searle - 1980 - Behavioral and Brain Sciences 3 (3):417-57.
Causality: Models, Reasoning and Inference.Judea Pearl - 2000 - Cambridge University Press.

View all 57 references / Add more references