Clark Glymour, Richard Scheines, Peter Spirtes and Kevin Kelly. Discovering Causal Structure: Artifical Intelligence, Philosophy of Science and Statistical Modeling.
Many philosophers have worried about what philosophy is. Often they have looked for answers by considering what it is that philosophers do. Given the diversity of topics and methods found in philosophy, however, we propose a different approach. In this article we consider the philosophical temperament, asking an alternative question: what are philosophers like? Our answer is that one important aspect of the philosophical temperament is that philosophers are especially reflective: they are less likely than their peers to embrace what (...) seems obvious without questioning it. This claim is supported by a study of more than 4,000 philosophers and non-philosophers, the results of which indicate that even when we control for overall education level, philosophers tend to be significantly more reflective than their peers. We then illustrate this tendency by considering what we know about the philosophizing of a few prominent philosophers. Recognizing this aspect of the philosophical temperament, it is natural to wonder how philosophers came to be this way: does philosophical training teach reflectivity or do more reflective people tend to gravitate to philosophy? We consider the limitations of our data with respect to this question and suggest that a longitudinal study be conducted. (shrink)
We argue that current discussions of criteria for actual causation are ill-posed in several respects. (1) The methodology of current discussions is by induction from intuitions about an infinitesimal fraction of the possible examples and counterexamples; (2) cases with larger numbers of causes generate novel puzzles; (3) "neuron" and causal Bayes net diagrams are, as deployed in discussions of actual causation, almost always ambiguous; (4) actual causation is (intuitively) relative to an initial system state since state changes are relevant, but (...) most current accounts ignore state changes through time; (5) more generally, there is no reason to think that philosophical judgements about these sorts of cases are normative; but (6) there is a dearth of relevant psychological research that bears on whether various philosophical accounts are descriptive. Our skepticism is not directed towards the possibility of a correct account of actual causation; rather, we argue that standard methods will not lead to such an account. A different approach is required. (shrink)
Coherentism maintains that coherent beliefs are more likely to be true than incoherent beliefs, and that coherent evidence provides more confirmation of a hypothesis when the evidence is made coherent by the explanation provided by that hypothesis. Although probabilistic models of credence ought to be well-suited to justifying such claims, negative results from Bayesian epistemology have suggested otherwise. In this essay we argue that the connection between coherence and confirmation should be understood as a relation mediated by the causal relationships (...) among the evidence and a hypothesis, and we offer a framework for doing so by fitting together probabilistic models of coherence, confirmation, and causation. We show that the causal structure among the evidence and hypothesis is sometimes enough to determine whether the coherence of the evidence boosts confirmation of the hypothesis, makes no difference to it, or even reduces it. We also show that, ceteris paribus, it is not the coherence of the evidence that boosts confirmation, but rather the ratio of the coherence of the evidence to the coherence of the evidence conditional on a hypothesis. (shrink)
The literature on causal discovery has focused on interventions that involve randomly assigning values to a single variable. But such a randomized intervention is not the only possibility, nor is it always optimal. In some cases it is impossible or it would be unethical to perform such an intervention. We provide an account of ‘hard' and ‘soft' interventions and discuss what they can contribute to causal discovery. We also describe how the choice of the optimal intervention(s) depends heavily on the (...) particular experimental setup and the assumptions that can be made. ‡The first author is funded by the Causal Learning Collaborative Initiative supported by the James S. McDonnell Foundation. Many aspects of this paper were inspired by discussions with members of the collaborative. †To contact the authors, please write to: Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA 15213; e-mail: [email protected] and [email protected] (shrink)
Over the last two decades, a fundamental outline of a theory of causal inference has emerged. However, this theory does not consider the following problem. Sometimes two or more measured variables are deterministic functions of one another, not deliberately, but because of redundant measurements. In these cases, manipulation of an observed defined variable may actually be an ambiguous description of a manipulation of some underlying variables, although the manipulator does not know that this is the case. In this article we (...) revisit the question of precisely characterizing conditions and assumptions under which reliable inference about the effects of manipulations is possible, even when the possibility of “ambiguous manipulations” is allowed. (shrink)
For nearly as long as the word “correlation” has been part of statistical parlance, students have been warned that correlation does not prove causation, and that only experimental studies, e.g., randomized clinical trials, can establish the existence of a causal relationship. Over the last few decades, somewhat of a consensus has emerged between statisticians, computer scientists, and philosophers on how to represent causal claims and connect them to probabilistic relations. One strand of this work studies the conditions under which evidence (...) accumulated from non-experimental (observational) studies can be used to infer a causal relationship. In this paper, I compare the typical conditions required to infer that one variable is a direct cause of another in observational and experimental studies. I argue that they are essentially the same. (shrink)
S There is a long tradition of representing causal relationships by directed acyclic graphs (Wright, 1934 ). Spirtes ( 1994), Spirtes et al. ( 1993) and Pearl & Verma ( 1991) describe procedures for inferring the presence or absence of causal arrows in the graph even if there might be unobserved confounding variables, and/or an unknown time order, and that under weak conditions, for certain combinations of directed acyclic graphs and probability distributions, are asymptotically, in sample size, consistent. These results (...) are surprising since they seem to contradict the standard statistical wisdom that consistent estimators of causal effects do not exist for nonrandomised studies if there are potentially unobserved confounding variables. We resolve the apparent incompatibility of these views by closely examining the asymptotic properties of these causal inference procedures. We show that the asymptotically consistent procedures are ‘pointwise consistent’, but ‘uniformly consistent’ tests do not exist. Thus, no finite sample size can ever be guaranteed to approximate the asymptotic results. We also show the nonexistence of valid, consistent confidence intervals for causal effects and the nonexistence of uniformly consistent point estimators. Our results make no assumption about the form of the tests or estimators. In particular, the tests could be classical independence tests, they could be Bayes tests or they could be tests based on scoring methods such as or . The implications of our results for observational studies are controversial and are discussed briefly in the last section of the paper. The results hinge on the following fact: it is possible to find, for each sample size n, distributions P and Q such that P and Q are empirically indistinguishable and yet P and Q correspond to different causal effects. (shrink)
Recent research finds that people respond more generously to individual victims described in detail than to equivalent statistical victims described in general terms. We propose that this “identified victim effect” is one manifestation of a more general phenomenon: a positive influence of tangible information on generosity. In three experiments, we find evidence for an “identified intervention effect”; providing tangible details about a charity’s interventions significantly increases donations to that charity. Although previous work described sympathy as the primary mediator between tangible (...) information and giving, current mediational analyses show that the influence of tangible details can operate through donors’ perception that their contribution will have impact. Taken together with past work, the results suggest that tangible information of many types promotes generosity and can do so either via sympathy or via perceived impact. The ability of tangible information to increase impact points to new ways for charities to encourage generosity. (shrink)
Many philosophers of science have argued that a set of evidence that is "coherent" confirms a hypothesis which explains such coherence. In this paper, we examine the relationships between probabilistic models of all three of these concepts: coherence, confirmation, and explanation. For coherence, we consider Shogenji's measure of association (deviation from independence). For confirmation, we consider several measures in the literature, and for explanation, we turn to Causal Bayes Nets and resort to causal structure and its constraint on probability. All (...) else equal, we show that focused correlation, which is the ratio of the coherence of evidence and the coherence of the evidence conditional on a hypothesis, tracks confirmation. We then show that the causal structure of the evidence and hypothesis can put strong constraints on how coherence in the evidence does or does not translate into confirmation of the hypothesis. (shrink)
In Causation, Prediction, and Search (CPS hereafter), Peter Spirtes, Clark Glymour and I developed a theory of statistical causal inference. In his presentation at the Notre Dame conference (and in his paper, this volume), Glymour discussed the assumptions on which this theory is built, traced some of the mathematical consequences of the assumptions, and pointed to situations in which the assumptions might fail. Nevertheless, many at Notre Dame found the theory difficult to understand and/or assess. As a result I was (...) asked to write this paper to provide a more intuitive introduction to the theory. In what follows I shun almost all formality and avoid the numerous and complicated qualifiers that typically accompany definitions or important philosophical concepts. They can be all be found in Glymour's paper or in CPS, which are clear although sometimes dense. Here I attempt to fix intuitions by highlighting a few of the essential ideas and by providing extremely simple examples throughout. (shrink)
The Carnegie Mellon Proof Tutor project was motivated by pedagogical concerns: we wanted to use a "mechanical" (i.e. computerized) tutor for teaching students..
The Gibbs sampler can be used to obtain samples of arbitrary size from the posterior distribution over the parameters of a structural equation model (SEM) given covariance data and a prior distribution over the parameters. Point estimates, standard deviations and interval estimates for the parameters can be computed from these samples. If the prior distribution over the parameters is uninformative, the posterior is proportional to the likelihood, and asymptotically the inferences based on the Gibbs sample are the same as those (...) based on the maximum likelihood solution, e.g., output from LISREL or EQS. In small samples, however, the likelihood surface is not Gaussian and in some cases contains local maxima. Nevertheless, the Gibbs sample comes from the correct posterior distribution over the parameters regardless of the sample size and the shape of the likelihood surface. With an informative prior distribution over the parameters, the posterior can be used to make inferences about the parameters of underidentified models, as we illustrate on a simple errors-in-variables model. (shrink)
Data analysis that merely fits an empirical covariance matrix or that finds the best least squares linear estimator of a variable is not of itself a reliable guide to judgements about policy, which inevitably involve causal conclusions. The policy implications of empirical data can be completely reversed by alternative hypotheses about the causal relations of variables, and the estimates of a particular causal influence can be radically altered by changes in the assumptions made about other dependencies.2 For these reasons, one (...) of the common aims of empirical research in the.. (shrink)
By combining experimental interventions with search procedures for graphical causal models we show that under familiar assumptions, with perfect data, N - 1 experiments suffice to determine the causal relations among N > 2 variables when each experiment randomizes at most one variable. We show the same bound holds for adaptive learners, but does not hold for N > 4 when each experiment can simultaneously randomize more than one variable. This bound provides a type of ideal for the measure of (...) success of heuristic approached in active learning methods of casual discovery, which currently use less informative measures. (shrink)
We show that if any number of variables are allowed to be simultaneously and independently randomized in any one experiment, log2(N ) + 1 experiments are sufficient and in the worst case necessary to determine the causal relations among N ≥ 2 variables when no latent variables, no sample selection bias and no feedback cycles are present. For all K, 0 < K <.
The statistical community has brought logical rigor and mathematical precision to the problem of using data to make inferences about a model’s parameter values. The TETRAD project, and related work in computer science and statistics, aims to apply those standards to the problem of using data and background knowledge to make inferences about a model’s specification. We begin by drawing the analogy between parameter estimation and model specification search. We then describe how the specification of a structural equation model entails (...) familiar constraints on the covariance matrix for all admissible values of its parameters; we survey results on the equivalence of structural equation models, and we discuss search strategies for model specification. We end by presenting several algorithms that are implemented in the TETRAD II program. (shrink)
We present an algorithm to infer causal relations between a set of measured variables on the basis of experiments on these variables. The algorithm assumes that the causal relations are linear, but is otherwise completely general: It provides consistent estimates when the true causal structure contains feedback loops and latent variables, while the experiments can involve surgical or `soft' interventions on one or multiple variables at a time. The algorithm is `online' in the sense that it combines the results from (...) any set of available experiments, can incorporate background knowledge and resolves conflicts that arise from combining results from different experiments. In addition we provide a necessary and sufficient condition that determines when the algorithm can uniquely return the true graph, and can be used to select the next best experiment until this condition is satisfied. We demonstrate the method by applying it to simulated data and the flow cytometry data of Sachs et al. (shrink)
Linear structural equation models (SEMs) are widely used in sociology, econometrics, biology, and other sciences. A SEM (without free parameters) has two parts: a probability distribution (in the Normal case specified by a set of linear structural equations and a covariance matrix among the “error” or “disturbance” terms), and an associated path diagram corresponding to the functional composition of variables specified by the structural equations and the correlations among the error terms. It is often thought that the path diagram is (...) nothing more than a heuristic device for illustrating the assumptions of the model. However, in this paper, we will show how path diagrams can be used to solve a number of important problems in structural equation modelling. (shrink)
In Causation, Prediction, and Search, we undertook a three part project. First, we characterized when causal models are indistinguishable by population conditional independence relations under several different assumptions relating causality to probability. Second, we proposed a number of algorithms that take sample data and optional background knowledge as input, and output a class of causal models compatible with the data and the background knowledge; the algorithms were accompanied by proofs of their correctness given assumptions that were clearly stated in CPS, (...) and that we will restate below. Finally, we offered a theory of how to predict the effects of interventions in causal structures, given only partial knowledge of causal structure. Freedman's objections are all directed against the causal inference algorithms we proposed. We do not have room here to discuss all of his criticisms, but we have answered his major points. With regard to the points we do not have room to discuss, the reader should be warned that Freedman is an unreliable interpreter of what we have written. For convenience, we have divided Freedman's objections into the following categories. 1.) Freedman questions some of the assumptions on which our correctness theorems are based. Some of his criticisms are based on covariance matrices that he constructed. None of the examples he constructed in sections 11.2, 11.3, or 12.3 are counterexamples to any theorem that we stated, nor are they even germane to the question of how probable are the assumptions we make. His examples only illustrate points discussed in detail in our book, in which we give similar examples. 2.) The most serious charge that Freedman makes is that the algorithms do not compute what we say they do. (shrink)
We present an algorithm to infer causal relations between a set of measured variables on the basis of experiments on these variables. The algorithm assumes that the causal relations are linear, but is otherwise completely general: It provides consistent estimates when the true causal structure contains feedback loops and latent variables, while the experiments can involve surgical or ‘soft’ interventions on one or multiple variables at a time. The algorithm is ‘online’ in the sense that it combines the results from (...) any set of available experiments, can incorporate background knowledge and resolves con- flicts that arise from combining results from different experiments. In addition we provide a necessary and sufficient condition that (i) determines when the algorithm can uniquely return the true graph, and (ii) can be used to select the next best experiment until this condition is satisfied. We demonstrate the method by applying it to simulated data and the flow cytometry data of Sachs et al (2005). (shrink)
Peter Spirtes, Clark Glymour, Richard Scheines, Christopher Meek, S. Fineberg, E. Slate. Prediction and Experimental Design with Graphical Causal Models.
nature of modern data collection and storage techniques, and the increases in the speed and storage capacities of computers. Statistics books from 30 years ago often presented examples with fewer than 10 variables, in domains where some background knowledge was plausible. In contrast, in new domains, such as climate research where satellite data now provide daily quantities of data unthinkable a few decades ago, fMRI brain imaging, and microarray measurements of gene expression, the number of variables can range into the (...) tens of thousands, and there is often limited background knowledge to reduce the space of alternative causal hypotheses. In such domains, non-automated causal discovery techniques appear to be hopeless, while the availability of faster computers with larger memories and disc space allow for the practical implementation of computationally intensive automated search algorithms over large search spaces. Contemporary science is not your grandfather’s science, or Karl Popper’s. Causal inference without experimental controls has long seemed as if it must somehow be capable of being cast as a kind of statistical inference involving estimators with some kind of convergence and accuracy properties under some kind of assumptions. Until recently, the statistical literature said not. While parameter estimation and experimental design for the effective use of data developed throughout the 20th century, as recently as 20 years ago the methodology of causal inference without experimental controls remained relatively primitive. Besides a cessation of hostilities from the majority of the statistical and philosophical communities (which has still only partially happened), several things were needed for theories of causal estimation to appear and to flower: well defined mathematical objects to represent causal relations; well defined connections between aspects of these objects and sample data; and a way to compute those connections. A sequence of studies beginning with Dempster’s work on the factorization of probability distributions [Dempster 1972] and culminating with Kiiveri and Speed’s [Kiiveri & Speed 1982] study of linear structural equation models, provided the first, in the form of directed acyclic graphs, and the second, in the form of the “local” Markov condition.. (shrink)
Students can use an educational system’s help in unexpected ways. For example, they may bypass abstract hints in search of a concrete solution. This behavior has traditionally been labeled as a form of gaming or help abuse. We propose that some examples of this behavior are not abusive and that bottom-out hints can act as worked examples. We create a model for distinguishing good student use of bottom-out hints from bad student use of bottom-out hints by means of logged response (...) times. We show that this model not only predicts learning, but captures behaviors related to self-explanation. (shrink)
Deciding matters of legal liability, in torts and other civil actions, requires deciding causation. The injury suffered by a plaintiff must be caused by an event or condition due to the defendant. The courts distinguish between cause-in-fact and proximate causation, where cause-in-fact is determined by the “but-for” test: the effect would not have happened, “but for” the cause.1 Proximate causation is a set of legal limitations on cause-in-fact.
It has been shown in Spirtes(1995) that X and Y are d-separated given Z in a directed graph associated with a recursive or non-recursive linear model without correlated errors if and only if the model entails that ρXY.Z = 0. This result cannot be directly applied to a linear model with correlated errors, however, because the standard graphical representation of a linear model with correlated errors is not a directed graph. The main result of this paper is to show how (...) to associate a directed graph with a linear model L with correlated errors, and then use d-separation in the associated directed graph to determine whether L entails that a particular partial correlation is zero. (shrink)
Practically, causation matters. Juries must decide, for example, whether a pregnant mother’s refusal to give birth by caesarean section was the cause of one of her twins death. Policy makers must decide whether violence on TV causes violence in life. Neither question can be coherently debated without some theory of causation. Fortunately (or not, depending on where one sits), a virtual plethora of theories of causation have been championed in the third of a century between 1970 and 2004.
There is now substantial agreement about the representational component of a normative theory of causal reasoning: Causal Bayes Nets. There is less agreement about a normative theory of causal discovery from data, either computationally or cognitively, and almost no work investigating how teaching the Causal Bayes Nets representational apparatus might help individuals faced with a causal learning task. Psychologists working to describe how naïve participants represent and learn causal structure from data have focused primarily on learning from single trials under (...) a variety of conditions. In contrast, one component of the normative theory focuses on learning from a sample drawn from a population under some experimental or observational study regime. Through a virtual Causality Lab that embodies the normative theory of causal reasoning and which allows us to record student behavior, we have begun to systematically explore how best to teach the normative theory. In this paper we explain the overall project and report on pilot studies which suggest that students can quickly be taught to (appear to) be quite rational. (shrink)
The statistical evidence for the detrimental effect of exposure to low levels of lead on the cognitive capacities of children has been debated for several decades. In this paper I describe how two techniques from artificial intelligence and statistics help make the statistical evidence for the accepted epidemiological conclusion seem decisive. The first is a variable-selection routine in TETRAD III for finding causes, and the second a Bayesian estimation of the parameter reflecting the causal influence of Actual Lead Exposure, a (...) latent variable, on the measured IQ score of middle class suburban children. (shrink)
More and more, judges and juries are being asked to handle torts and other cases in which establishing liability involves understanding large bodies of complex scientific evidence. When establishing causation is involved, the evidence can be diverse, can involve complicated statistical models, and can seem impenetrable to non-experts. Since the decision in Daubert v. Merril Dow Pharms., Inc.1 in 1993, judges cannot simply admit expert testimony and other technical evidence and let jurors decide the verdict. Judges now must rule on (...) which experts are admissible and which are inadmissible, and they must base their ruling at least partly on the status of the scientific evidence about which the expert will testify.2 This article is intended to provide judges with an accessible methodological overview of causal science. (shrink)
Students in two classes in the fall of 2004 making extensive use of online courseware were logged as they visited over 500 different “learning pages” which varied in length and in difficulty. We computed the time spent on each page by each student during each session they were logged in. We then modeled the time spent for a particular visit as a function of the page itself, the session, and the student. Surprisingly, the average time a student spent on learning (...) pages was of almost no value in predicting how long they would spend on a given page, even controlling for the session and page difficulty. The page itself was highly predictive, but so was the average time spent on learning pages in a given session. This indicates that local considerations, e.g., mood, deadline proximity, etc., play a much greater role in determining student pace and attention than do intrinsic student traits. We also consider the average time spent on learning pages as a function of the time of semester. Students spent less time on pages later in the semester, even for more demanding material. (shrink)
The past two decades have seen a dramatic growth in the use of statisticians and economists for the presentation of expert testimony in legal proceedings. In this paper, we describe a hypothetical case modeled on real ones and involving statistical testimony regarding the causal effect of lead on lowering the IQs of children who ingest lead paint chips. The data we use come from a well-known pioneering study on the topic and the analyses we describe as the expert testimony are (...) similar to ones that can be found in major scientific journals. The battle of the experts in this hypothetical case resembles that which many encounter as expert witnesses. The paper concludes with some observations and advice. (shrink)
Researchers routinely face the problem of inferring causal relationships from large amounts of data, sometimes involving hundreds of variables. Often, it is the causal relationships between "latent" (unmeasured) variables that are of primary interest. The problem is how causal relationships between unmeasured variables can be inferred from measured data. For example, naval manpower researchers have been asked to infer the causal relations among psychological traits such as job satisfaction and job challenge from a data base in which neither trait is (...) measured directly, but in which answers to interview questions are plausibly associated with each trait. By combining background knowledge with an algorithm that searches for causal structure among the unobserved variables, we have created a tool that can reliably extract useful causal information about latent variables from large data bases. In what follows we describe the class of causal models to which our.. (shrink)
DNA microarrays are perfectly suited for comparing gene expression in different populations of cells. An important application of microarray techniques is identifying genes which are activated by a particular drug of interest. This process will allow biologists to identify therapies targeted to particular diseases, and, eventually, to gain more knowledge about the biological processes in organisms. Such an application is described in this paper. It is focused on diabetes and obesity, which is a genetically heterogeneous disease, meaning that multiple defective (...) genes are responsible for the diseases. The paper is divided in three parts, each dealing with a different problem addressed to our study. First we validate the data from our microarray experiment. We identified significant systematic sources of variability which are potentially issues for other microarray datasets. Second, we applied multiple hypothesis testing to identify differentially expressed genes. We found a set of genes which appear to change in expression level over time in response to a drug treatment. Third, we tried to address the problem of identification of co-expressed genes using cluster analysis. This last problem is still under discussion. (shrink)
Conceptual understanding of representations and fluency in using representations are important aspects of expertise. However, little is known about how these competencies interact: does representational understanding facilitate learning of fluency, or does fluency enhance learning of representational understanding? We analyze log data obtained from an experiment that investigates the effects of intelligent tutoring systems support for understanding and fluency in connection-making between fractions representations. The experiment shows that instructional support for both representational understanding and fluency are needed for students to (...) benefit from the ITS. In analyzing the ITS log data, we contrast the understanding-first hypothesis and the fluency-first hypothesis, testing whether errors made during the learning phase mediate the effect of experimental condition. Finding that a simple statistical model does not the fit data, we searched over all plausible causal path analysis models. Our results support the understanding-first hypothesis but not the fluency-first hypothesis. (shrink)
In a series of 5 experiments in 2000 and 2001, several hundred students at two different universities with three different professors and six different teaching assistants took a semester long course on causal and statistical reasoning in either traditional lecture/recitation or online/recitation format. In this paper we compare the pre-post test gains of these students, we identify features of the online experience that were helpful and features that were not, and we identify student learning strategies that were effective and those (...) that were not. Students who entirely replaced going to lecture with doing online modules did as well and usually better than those who went to lecture. Simple strategies like incorporating frequent interactive comprehension checks into the online material proved effective, but online students attended face-to-face recitations less often than lecture students and suffered because of it. Supporting the idea that small, interactive recitations are more effective than large, passive lectures, recitation attendance was three times as important as lecture attendance for predicting pre-test to post-test gains. For the online student, embracing the online environment as opposed to trying to convert it into a traditional print-based one was an important strategy, but simple diligence in attempting “voluntary” exercises was by far the most important factor in student success. (shrink)
Drawing substantive conclusions from linear causal models that perform acceptably on statistical tests is unreasonable if it is not known how alternatives fare on these same tests. We describe a computer program, TETRAD, that helps to search rapidly for plausible alternatives to a given causal structure. The program is based on principles from statistics, graph theory, philosophy of science, and artificial intelligence. We describe these principles, discuss how TETRAD employs them, and argue that these principles make TETRAD an effective tool. (...) Finally, we illustrate TETRAD's effectiveness by applying it to a multiple indicator model of Political and Industrial development. A pilot version of the TETRAD program is described in this paper. The current version is described in our forthcoming Discovering Causal Structure: Artificial Intelligence for Statistical Modeling. (shrink)