It is time to escape the constraints of the Systematics Wars narrative and pursue new questions that are better positioned to establish the relevance of the field in this time period to broader issues in the history of biology and history of science. To date, the underlying assumptions of the Systematics Wars narrative have led historians to prioritize theory over practice and the conflicts of a few leading theorists over the less-polarized interactions of systematists at large. We show how shifting (...) to a practice-oriented view of methodology, centered on the trajectory of mathematization in systematics, demonstrates problems with the common view that one camp straightforwardly “won” over the other. In particular, we critique David Hull’s historical account in Science as a Process by demonstrating exactly the sort of intermediate level of positive sharing between phenetic and cladistic theories that undermines their mutually exclusive individuality as conceptual systems over time. It is misleading, or at least inadequate, to treat them simply as holistically opposed theories that can only interact by competition to the death. Looking to the future, we suggest that the concept of workflow provides an important new perspective on the history of mathematization and computerization in biology after World War II. (shrink)
We introduce a new type of pluralism about biological function that, in contrast to existing, demonstrates a practical integration among the term’s different meanings. In particular, we show how to generalize Sandra Mitchell’s notion of integrative pluralism to circumstances where multiple epistemic tools of the same type are jointly necessary to solve scientific problems. We argue that the multiple definitions of biological function operate jointly in this way based on how biologists explain the evolution of protein function. To clarify how (...) our account relates to existing views, we introduce a general typology for monist and pluralist accounts along with standardized criteria for judging which is best supported by evidence. (shrink)
What are the prospects for a monistic view of biological individuality given the multiple epistemic roles the concept must satisfy? In this paper, I examine the epistemic adequacy of two recent accounts based on the capacity to undergo natural selection. One is from Ellen Clarke, and the other is by Peter Godfrey-Smith. Clarke’s position reflects a strong monism, in that she aims to characterize individuality in purely functional terms and refrains from privileging any specific material properties as important in their (...) own right. I argue that Clarke’s functionalism impairs the epistemic adequacy of her account compared to a middle-ground position taken by Godfrey-Smith. In comparing Clarke and Godfrey-Smith’s account, two pathways emerge to pluralism about biological individuality. The first develops from the contrast between functionalist and materialist approaches, and the second from an underlying temporal structure involved in using evolutionary processes to define individuality. (shrink)
Contemporary biology has inherited two key assumptions from the Modern Synthesis about the nature of population lineages: sexual reproduction is the exemplar for how individuals in population lineages inherit traits from their parents, and random mating is the exemplar for reproductive interaction. While these assumptions have been extremely fruitful for a number of fields, such as population genetics and phylogenetics, they are increasingly unviable for studying the full diversity and evolution of life. I introduce the “mixture” account of population lineages (...) that escapes these assumptions by dissolving the Modern Synthesis’s sharp line separating reproduction and development and characterizing reproductive integration in population lineages by the ephemerality of isolated subgroups rather than random mating. The mixture account provides a single criterion for reproductive integration that accommodates both sexual and asexual reproduction, unifying their treatment under Kevin de Queiroz’s generalized lineage concept of species. The account also provides a new basis for empirically assessing the effect of random mating as an idealization on the empirical adequacy of population genetic models. (shrink)
Criticism of big data has focused on showing that more is not necessarily better, in the sense that data may lose their value when taken out of context and aggregated together. The next step is to incorporate an awareness of pitfalls for aggregation into the design of data infrastructure and institutions. A common strategy minimizes aggregation errors by increasing the precision of our conventions for identifying and classifying data. As a counterpoint, we argue that there are pragmatic trade-offs between precision (...) and ambiguity that are key to designing effective solutions for generating big data about biodiversity. We focus on the importance of theory-dependence as a source of ambiguity in taxonomic nomenclature and hence a persistent challenge for implementing a single, long-term solution to storing and accessing meaningful sets of biological specimens. We argue that ambiguity does have a positive role to play in scientific progress as a tool for efficiently symbolizing multiple aspects of taxa and mediating between conflicting hypotheses about their nature. Pursuing a deeper understanding of the trade-offs and synthesis of precision and ambiguity as virtues of scientific language and communication systems then offers a productive next step for realizing sound, big biodiversity data services. (shrink)
The collection and classification of data into meaningful categories is a key step in the process of knowledge making. In the life sciences, the design of data discovery and integration tools has relied on the premise that a formal classificatory system for expressing a body of data should be grounded in consensus definitions for classifications. On this approach, exemplified by the realist program of the Open Biomedical Ontologies Foundry, progress is maximized by grounding the representation and aggregation of data on (...) settled knowledge. We argue that historical practices in systematic biology provide an important and overlooked alternative approach to classifying and disseminating data, based on a principle of coordinative rather than definitional consensus. Systematists have developed a robust system for referring to taxonomic entities that can deliver high quality data discovery and integration without invoking consensus about reality or “settled” science. (shrink)
What does it look like when a group of scientists set out to re-envision an entire field of biology in symbolic and formal terms? I analyze the founding and articulation of Numerical Taxonomy between 1950 and 1970, the period when it set out a radical new approach to classification and founded a tradition of mathematics in systematic biology. I argue that introducing mathematics in a comprehensive way also requires re-organizing the daily work of scientists in the field. Numerical taxonomists sought (...) to establish a mathematical method for classification that was universal to every type of organism, and I argue this intrinsically implicated them in a qualitative re-organization of the work of all systematists. I also discuss how Numerical Taxonomy’s re-organization of practice became entrenched across systematic biology even as opposing schools produced their own competing mathematical methods. In this way, the structure of the work process became more fundamental than the methodological theories that motivated it. (shrink)
We argue that the mathematization of science should be understood as a normative activity of advocating for a particular methodology with its own criteria for evaluating good research. As a case study, we examine the mathematization of taxonomic classification in systematic biology. We show how mathematization is a normative activity by contrasting its distinctive features in numerical taxonomy in the 1960s with an earlier reform advocated by Ernst Mayr starting in the 1940s. Both Mayr and the numerical taxonomists sought to (...) formalize the work of classification, but Mayr introduced a qualitative formalism based on human judgment for determining the taxonomic rank of populations, while the numerical taxonomists introduced a quantitative formalism based on automated procedures for computing classifications. The key contrast between Mayr and the numerical taxonomists is how they conceptualized the temporal structure of the workflow of classification, specifically where they allowed meta-level discourse about difficulties in producing the classification. (shrink)
The idea that ambiguity can be productive in data science remains controversial. Efforts to make scientific publications and data intelligible to computers generally assume that accommodating multiple meanings for words, known as polysemy, undermines reasoning and communication. This assumption has nonetheless been contested by historians, philosophers, and social scientists, who have applied qualitative research methods to demonstrate the generative and strategic value of polysemy. Recent quantitative results from linguistics have also shown how polysemy can actually improve the efficiency of human (...) communication. I present a new conceptual typology based on a synthesis of prior research about the aims, norms, and circumstances under which polysemy arises and is evaluated. The typology supports a contextual pluralist view of polysemy’s value for scientific research practices: polysemy does both substantial positive and negative work in science, but its utility is context-sensitive in ways that are often overlooked by the norms people have formulated to regulate its use, including prior scholars researching polysemy. I also propose that historical patterns in the use of partial synonyms, i.e. terms with overlapping meanings, provide an especially promising phenomenon for integrative research addressing these issues. (shrink)
We critique the organizational account of biological functions by showing how its basis in the closure of constraints fails to be objective. While the account treats constraints as objective features of physical systems, the number and relationship of potential constraints are subject to potentially arbitrary redescription by investigators. For example, we show that self-maintaining systems such as candle flames can realize closure on a more thorough analysis of the case, contradicting the claim that these “simple” systems lack functional organization. This (...) also raises problems for Moreno and Mossio’s associated theory of biological autonomy, which asserts that living beings are distinguished by their possession of a closed system of constraints that channel and regulate their metabolic processes. (shrink)
Evolutionary conceptions of species place special weight on each species having dynamic independence as a unit of evolution. However, the idea that species have their own historical fates, tendencies, or roles has resisted systematic analysis. Growing evidence from population genomics shows that many paradigm species regularly engage in hybridization. How can species be defined in terms of independent evolutionary identities if their genomes are dynamically coupled through lateral exchange? I introduce the concept of a “composite lineage” to distinguish species and (...) subspecies on the basis of the proportion of a group’s heritable traits that are uncoupled from reproductive exchange. (shrink)
The growing range of methods for statistical model selection is inspiring new debates about how to handle the potential for conflicting results when different methods are applied to the same data. While many factors enter into choosing a model selection method, we focus on the implications of disagreements among scientists about whether, and in what sense, the true probability distribution is included in the candidate set of models. While this question can be addressed empirically, the data often provide inconclusive results (...) in practice. In such cases, we argue that differences in prior metaphysical views about the local adequacy of the models can produce underdetermination of results, even for the same data and candidate models. As a result, data alone are sometimes insufficient to settle rational beliefs about nature. (shrink)
Many philosophers are skeptical about the scientific value of the concept of biological information. However, several have recently proposed a more positive view of ascribing information as an exercise in scientific modeling. I argue for an alternative role: guiding empirical data collection for the sake of theorizing about the evolution of semantics. I clarify and expand on Bergstrom and Rosvall’s suggestion of taking a “diagnostic” approach that defines biological information operationally as a procedure for collecting empirical cases. The more recent (...) modeling-based accounts still perpetuate a theory-centric view of scientific concepts, which motivated philosophers’ misplaced skepticism in the first place. (shrink)
Biodiversity science is in a pivotal period when diverse groups of actors—including researchers, businesses, national governments, and Indigenous Peoples—are negotiating wide-ranging norms for governing and managing biodiversity data in digital repositories. These repositories, often called biodiversity data portals, are a type of organization for which governance can address or perpetuate the colonial history of biodiversity science and current inequities. Researchers and Indigenous Peoples are developing and implementing new strategies to examine and change assumptions about which agents should count as salient (...) participants in scientific projects, especially in projects that build and manage large digital data portals. Two notable efforts are the FAIR (Findable, Accessible, Interoperable, Reusable) and CARE (Collective benefit, Authority, Responsibility, Ethics) Principles for scientific data management and governance. To characterize how these principles influence the governance of biodiversity data portals, we develop an account of fit-for-use data that makes explicit its social as well as technical conditions in relation to agents and purposes. The FAIR Principles, already widely adopted by biodiversity researchers, prioritize machine agents and efficient computation, while the CARE Principles prioritize Indigenous Peoples and their data sovereignty. Both illustrate the potency of an emerging general strategy by which groups of actors craft and implement governance principles for data fitness-of-use to change assumptions about who are salient participants in data science. (shrink)
Biodiversity scientists often describe their field as aiming to save life and humanity, but the field has yet to reckon with the history and contemporary practices of colonialism and systematic racism inherited from natural history. The online data portals scientists use to store and share biodiversity data are a growing class of organizations whose governance can address or perpetuate and further institutionalize the implicit assumptions and inequitable social impacts from this extensive history. In this context, researchers and Indigenous Peoples are (...) developing and implementing new strategies to examine and change assumptions about which agents should count as salient participants to scientific projects, especially projects that build and manage large digital data portals. Two notable efforts are the FAIR and CARE Principles for scientific data management and governance. To characterize how these influence the governance of biodiversity data portals, we develop an account of fitness-for-use that makes explicit its social as well as technical conditions in relation to agents and purposes. The FAIR Principles, already widely adopted by biodiversity data projects, prioritize machine agents and efficient computation, while the CARE Principles prioritize Indigenous Peoples and their data sovereignty. Both illustrate the potency of an emerging general strategy by which groups of actors craft and implement governance principles for data fitness-of-use to change assumptions about salient participants to data science. (shrink)
Among the many causes of an event, how do we distinguish the important ones? Are there ways to distinguish among causes on principled grounds that integrate both practical aims and objective knowledge? Psychologist Tania Lombrozo has suggested that causal explanations “identify factors that are ‘exportable’ in the sense that they are likely to subserve future prediction and intervention” (Lombrozo 2010, 327). Hence portable causes are more important precisely because they provide objective information to prediction and intervention as practical aims. However, (...) I argue that this is only part of the epistemology of causal selection. Recent work on portable causes has implicitly assumed them to be portable within the same causal system at a later time. As a result, it has appeared that the objective content of causal selection includes only facts about the causal structure of that single system. In contrast, I present a case study from systems biology in which scientists are searching for causal factors that are portable across rather than within causal systems. By paying careful attention to how these biologists find portable causes, I show that the objective content of causal selection can extend beyond the immediate systems of interest. In particular, knowledge of the evolutionary history of gene networks is necessary for correctly identifying causal patterns in these networks that explain cellular behavior in a portable way. (shrink)
A classic analytic approach to biological phenomena seeks to refine definitions until classes are sufficiently homogenous to support prediction and explanation, but this approach founders on cases where a single process produces objects with similar forms but heterogeneous behaviors. I introduce object spaces as a tool to tackle this challenging diversity of biological objects in terms of causal processes with well-defined formal properties. Object spaces have three primary components: (1) a combinatorial biological process such as protein synthesis that generates objects (...) with parts that are modular, independent, and organized according to an invariant syntax; (2) a notion of “distance” that relates the objects according to rules of change over time as found in nature or useful for algorithms; (3) mapping functions defined on the space that map its objects to other spaces or apply an evaluative criterion to measure an important quality, such as parsimony or biochemical function. Once defined, an object space can be used to represent and simulate the dynamics of phenomena on multiple scales; it can also be used as a tool for predicting higher-order properties of the objects, including stitching together series of causal processes. Object spaces are the basis for a strategy of theorizing, discovery, and analysis in biology: as heuristic idealizations of biology, they help us transform inchoate, intractable problems into articulated, well-structured ones. Developing an object space is a research strategy with a long, successful history under many other names, and it offers a unifying but not overreaching approach to biological theory. (shrink)
Consensus about a classification is defined as agreement on a set of classes and their relations for us in forming beliefs. While most research on scientific consensus has focused on consensus about a belief as a mark of truth, we highlight the importance of consensus in justifying shared classificatory language. What sort of consensus, if any, is the best basis for communicating and reasoning with scientific classifications? We describe an often-overlooked coordinative role for consensus that leverage agreement on how to (...) disagree such that actors involved can still achieve one or more shared aims even when they do not agree on shared beliefs or categories. Looking forward, we suggest that investigating structures and methods for coordinative consensus provides an important new direction for research on the epistemic foundations of knowledge organization. (shrink)
Big data is opening new angles on old questions about scientific progress. Is scientific knowledge cumulative? If yes, how does it make progress? In the life sciences, what we call the Consensus Principle has dominated the design of data discovery and integration tools: the design of a formal classificatory system for expressing a body of data should be grounded in consensus. Based on current approaches in biomedicine and systematic biology, we formulate and compare three types of the Consensus Principle: realist, (...) contextual-best, and coordinative. Contrasted with the realist program of the Open Biomedical Ontologies Foundry, we argue that historical practices in systematic biology provide an important and overlooked alternative based on coordinative consensus. Systematists have developed a robust system for referring to taxonomic entities that can deliver high quality data discovery and integration without invoking consensus about reality or “settled” science. (shrink)
Making sense of why something succeeded or failed is central to scientific practice: it provides an interpretation of what happened, i.e. an hypothesized explanation for the results, that informs scientists’ deliberations over their next steps. In philosophy, the realism debate has dominated the project of making sense of scientists’ success and failure claims, restricting its focus to whether truth or reliability best explain science’s most secure successes. Our aim, in contrast, will be to expand and advance the practice-oriented project sketched (...) by Arthur Fine in his work on the Natural Ontological Attitude. An important obstacle to articulating a positive program, we suggest, has been overlooking how scientists adopt standardized rules and procedures in order to define and operationalize meanings for success and failure relative to their situated goals. To help fill this gap, we introduce two new ideas, design specifications and track records, and show how they advance our ability to make sense of scientific modeling practices while maintaining a deflationary stance toward the realism debate. (shrink)
What should the best practices be for modeling zoonotic disease risks, e.g. to anticipate the next pandemic, when background assumptions are unsettled or evolving rapidly? This challenge runs deeper than one might expect, all the way into how we model the robustness of contemporary phylogenetic inference and taxonomic classifications. Different and legitimate taxonomic assumptions can destabilize the putative objectivity of zoonotic risk assessments, thus potentially supporting inconsistent and overconfident policy decisions.