The early days of contemporary philosophy of science: novel insights from machine translation and topic-modeling of non-parallel multilingual corpora

Synthese 200 (3):1-33 (2022)
  Copy   BIBTEX

Abstract

Topic model is a well proven tool to investigate the semantic content of textual corpora. Yet corpora sometimes include texts in several languages, making it impossible to apply language-specific computational approaches over their entire content. This is the problem we encountered when setting to analyze a philosophy of science corpus spanning over eight decades and including original articles in Dutch, German and French, on top of a large majority of articles in English. To circumvent this multilingual problem, we use machine-translation tools to bulk translate non-English documents into English. Though largely imperfect, especially syntactically, these translations nevertheless provide correctly translated terms and preserve the semantic proximity of documents with respect to one another. To assess the quality of this translation step, we develop a “semantic topology preservation test” that relies on estimating the extent to which document-to-document distances have been preserved during translation. We then conduct an LDA topic-model analysis over the entire corpus of translated and English original texts, and compare it to a topic-model done over the English original texts only. We thereby identify the specific contribution of the translated texts. These studies reveal a more complete picture of main topics that can found in the philosophy of science literature, especially during the early days of the discipline when numerous articles were published in languages other than English.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,386

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Local context selection for aligning sentences in parallel corpora.Ergun Biçici - 2007 - In D. C. Richardson B. Kokinov (ed.), Modeling and Using Context. Springer. pp. 82--93.
Parallel architectures and mental computation.Andrew Wells - 1993 - British Journal for the Philosophy of Science 44 (3):531-542.
Perspectives on Modeling in Cognitive Science.Richard M. Shiffrin - 2010 - Topics in Cognitive Science 2 (4):736-750.

Analytics

Added to PP
2022-06-01

Downloads
21 (#720,615)

6 months
8 (#347,798)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Christophe Malaterre
Université Du Québec À Montréal (UQAM)

Citations of this work

No citations found.

Add more citations