Synthese 199 (5-6):13689-13748 (
2021)
Copy
BIBTEX
Abstract
The debates between Bayesian, frequentist, and other methodologies of statistics have tended to focus on conceptual justifications, sociological arguments, or mathematical proofs of their long run properties. Both Bayesian statistics and frequentist (“classical”) statistics have strong cases on these grounds. In this article, we instead approach the debates in the “Statistics Wars” from a largely unexplored angle: simulations of different methodologies’ performance in the short to medium run.
We conducted a large number of simulations using a straightforward decision problem based around tossing a coin with unknown bias and then placing bets. In this simulation, we programmed four players, inspired by Bayesian statistics, frequentist statistics, Jon Williamson’s version of Objective Bayesianism, and a player who simply extrapolates from observed frequencies to general frequencies. The last player functions as a benchmark: a statistical methodology should at least outperform a crude form of induction. We focused on the performance of these methodologies in guiding the players towards good decisions. Unlike an earlier simulation study of this type, we found no systematic difference in performance between the Bayesian and frequentist players, provided the Bayesian used a flat prior and the frequentist used a low confidence level. Unlike that study, we were able to use Big Data methods to mitigate problems of random error in the
simulation results. The Williamsonian player, who is a novel element of our study, also had no systematic differences in their performance, provided that they use a low confidence level. These players performed similarly even in the very short run, when players were making different decisions. Our study indicates that all three methodologies should be taken seriously by philosophers and practitioners of statistics. However, the frequentist and Williamsonian performed poorly when their confidence levels were high, and the Bayesian was surprisingly harmed by biased priors, providing some unexpected lessons for these methodologies when facing this type of decision problem.