Optimization of Scientific Reasoning: a Data-Driven Approach
Dissertation, (
2019)
Copy
BIBTEX
Abstract
Scientific reasoning represents complex argumentation patterns that eventually lead to scientific discoveries. Social epistemology of science provides a perspective on the scientific community as a whole and on its collective knowledge acquisition. Different techniques have been employed with the goal of maximization of scientific knowledge on the group level. These techniques include formal models and computer simulations of scientific reasoning and interaction. Still, these models have tested mainly abstract hypothetical scenarios. The present thesis instead presents data-driven approaches in social epistemology of science. A data-driven approach requires data collection and curation for its further usage, which can include creating empirically calibrated models and simulations of scientific inquiry, performing statistical analyses, or employing data- mining techniques and other procedures.
We present and analyze in detail three co-authored research projects on which the thesis’ author was engaged during her PhD. The first project sought to identify optimal team composition in high energy physics laboratories using data-mining techniques. The results of this project are published in (Perović et al. 2016), and indicate that projects with smaller numbers of teams and team members outperform bigger ones. In the second project, we attempted to determine whether there is an epistemic saturation point in experimentation in high energy physics. The initial results from this project are published in (Sikimić et al. 2018). In the thesis, we expand on this topic by using computer simulations to test for biases that could induce scientists to invest in projects beyond their epistemic saturation point. Finally, in previous examples of data-driven analyses, citations are used as a measure of epistemic efficiency of projects in high energy physics. In order to additionally justify and analyze the usage of this parameter in their data-driven research, in the third project Perović & Sikimić (2019) analyzed and compared inductive patterns in experimental physics and biology with the reliability of citation records in these fields. They conclude that while citations are a relatively reliable measure of efficiency in high energy physics research, the same does not hold for the majority of research in experimental biology.
Additionally, contributions of the author that are for the first time published in this theses are: (a) an empirically calibrated model of scientific interaction of research groups in biology, (b) a case study of irregular argumentation patterns in some pathogen discoveries, and (c) an introductory discussion of the benefits and limitations of data- driven approaches to the social epistemology of science. Using computer simulations of an empirically calibrated model, we demonstrate that having several levels of hierarchy and division into smaller research sub-teams is epistemically beneficial for researchers in experimental biology. We also show that argumentation analysis in biology represents a good starting point for further data-driven analyses in the field. Finally, we conclude that a data-driven approach is informative and useful for science policy, but requires careful considerations about data collection, curation, and interpretation.