Significance Tests: Their Logic and Early History

Dissertation, Stanford University (1981)
  Copy   BIBTEX

Abstract

Significance tests are the mainstay of much experimental analysis. They formalize reasoning of the following sort: Assuming hypothesis h, evidence e is improbable, if e is observed, reject h. Most philosophical work on induction is either about a simpler form of reasoning, such as Reichenbach's straight rule , or a more powerful form of reasoning, such as Neyman/Pearson confidence intervals, Bayesian posterior densities or Fisher fiducial probabilities. By focusing on this simple, common method of inductive inference, many of the subtleties of the problem of scientific induction become apparent. ;Three features of the logic of significance tests are isolated. The test statistic must single out the correct aspect of the evidence for inference. The stringency measure must correctly formalize the improbability of the evidence. Composite hypotheses pose additional problems since they do not stipulate exact probabilities for all outcomes. ;Until Karl Pearson's 1895 paper on skew frequency curves, statisticians chiefly used the Normal curve. A question of goodness of fit became crucial with many different frequency curves available. Pearson proposed the Chi-Squared statistic as a measure of fit. He justified it by its relation to correlation--a category which replaced causation within Pearson's positivist philosophy. In fact, Chi-Squared works well only when correlation models adequately describe the phenomena of concern. ;Levels of significance measure test stringency as the probability of the observed value of the test statistic being in the tails of the test statistic density. This practice stems from the theory of errors of observation . Probable error, defined in terms of tail areas under the Normal density, measures the precision of a series of observations. Early significance tests using the Normal density measured stringency in terms of multiples of the probable error. Nowadays many densities are used in significance testing; stringency is still measured by tail areas. Hence, rejection occurs if and only if the test statistic takes a value in the tails. No completely satisfactory analysis of this practice now exists. ;In 1904 Pearson extended Chi-Squared to test the composite hypothesis of statistical independence. This extension yielded conflicting inferences from those based on other tests of independence. In 1922, by means of his new concept of degrees of freedom, R. A. Fisher proposed a solution. Degrees of freedom measure the informativeness of an hypothesis. From 1922 onward, significance tests test not only the putative truth of an hypothesis but its informativeness. Fisher's solution violates a rule of implication: If h implies i, then evidence sufficient to reject i is sufficient to reject h. This rule is widely endorsed by philosophers; indeed, Hempel calls it a condition of adequacy for any theory of confirmation. But, if h implies i, h is more informative than i. Consequently, if we test for informativeness, if h implies i, evidence sufficient to reject i need not be sufficient to reject the more informative h. Since the introduction of degrees of freedom, significance tests have checked both informativeness and truth. This examination of significance testing reveals aspects of induction missed by analyses from first principles

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 93,867

External links

  • This entry has no external links. Add one.
Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Tests of Significance Violate the Rule of Implication.Davis Baird - 1984 - PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1984:81 - 92.
Severe testing as a basic concept in a neyman–pearson philosophy of induction.Deborah G. Mayo & Aris Spanos - 2006 - British Journal for the Philosophy of Science 57 (2):323-357.
Significance testing, p-values and the principle of total evidence.Bengt Autzen - 2016 - European Journal for Philosophy of Science 6 (2):281-295.
Novel evidence and severe tests.Deborah G. Mayo - 1991 - Philosophy of Science 58 (4):523-552.
How Strong is the Confirmation of a Hypothesis by Significant Data?Thomas Bartelborth - 2016 - Journal for General Philosophy of Science / Zeitschrift für Allgemeine Wissenschaftstheorie 47 (2):277-291.

Analytics

Added to PP
2015-02-06

Downloads
0

6 months
0

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.
How can I increase my downloads?

Author's Profile

Davis Baird
Clark University

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references