Transforming large collections of scientiﬁc publications to XML

M. Kohlhase; D. Ginev; C. David; B. R. Miller

Transforming large collections of scientiﬁc publications to XML

M. Kohlhase, D. Ginev, C. David & B. R. Miller

Abstract

lecting statistics about missing bindings and macros, and other errors. This guides debugging and development eﬀorts, leading to iterative improvements in both the tools and the quality of the converted corpus. The build system thus serves as both a production conversion engine and software test harness. We have now processed the complete arχiv collection through 2006 consisting of more than 400,000 documents (a complete run is a processor-yearsize undertaking), continuously improving our success rate. We are now able to convert more than 90% of these documents to XHTML+MathML. We consider over 60% to be successes, converted with no or minor warnings. While the remaining 30% can also be converted, their quality is doubtful, due to unsupported macros or conversion errors

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Edit

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Author's Profile

Carlos David

Universidad de La Salle - Santafé de Bogotá

Keywords

Add keywords

Reprint years

My notes

Analytics

Added to PP
2010-12-22

Downloads
40 (#113,921)

6 months
40 (#385,383)

Historical graph of downloads

How can I increase my downloads?

Author's Profile

Carlos David

Universidad de La Salle - Santafé de Bogotá

Citations of this work

Towards context-based disambiguation of mathematical expressions.Magdalena Wolska - unknown

Add more citations

References found in this work

A Search Engine for Mathematical Formulae.Michael Kohlhase - unknown

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Transforming large collections of scientiﬁc publications to XML

Abstract

Author's Profile

Categories

Keywords

Reprint years

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Author's Profile

Citations of this work

References found in this work