Inflated effect sizes and underpowered tests: how the severity measure of evidence is affected by the winner’s curse

Philosophical Studies 178 (1):133-145 (2021)
  Copy   BIBTEX

Abstract

My aim in this paper is to show how the problem of inflated effect sizes corrupts the severity measure of evidence. This has never been done. In fact, the Winner’s Curse is barely mentioned in the philosophical literature. Since the severity score is the predominant measure of evidence for frequentist tests in the philosophical literature, it is important to underscore its flaws. It is also crucial to bring the philosophical literature up to speed with the limits of classical testing. The Winner’s Curse is one of them. The problem is that when a significant result is obtained by using an underpowered test, the severity score becomes particularly high for large discrepancies from the null-hypothesis. This means that such discrepancies are very well supported by the evidence according to that measure. However, it is now well documented that significant tests with low power display inflated effect sizes. They systematically show departures from the null hypothesis H0 that are much greater than they really are. From an epistemological point of view this means that a significant result produced by an underpowered test does not provide evidence for large discrepancies from H0. Therefore, the severity score is an inadequate measure of evidence. Given that we are now aware of the phenomenon of inflated effect sizes, it would be irresponsible to rely on the severity score to measure the strength of the evidence against the null. Instead, one must take appropriate measures to try and avoid using underpowered tests by setting a threshold for the sample size or by replicating the results of the experiment.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,628

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Severe testing as a basic concept in a neyman–pearson philosophy of induction.Deborah G. Mayo & Aris Spanos - 2006 - British Journal for the Philosophy of Science 57 (2):323-357.
Novel evidence and severe tests.Deborah G. Mayo - 1991 - Philosophy of Science 58 (4):523-552.

Analytics

Added to PP
2018-07-18

Downloads
56 (#284,244)

6 months
16 (#154,237)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

References found in this work

Severe testing as a basic concept in a neyman–pearson philosophy of induction.Deborah G. Mayo & Aris Spanos - 2006 - British Journal for the Philosophy of Science 57 (2):323-357.
Who Should Be Afraid of the Jeffreys-Lindley Paradox?Aris Spanos - 2013 - Philosophy of Science 80 (1):73-93.

Add more references