Transforming large collections of scientific publications to XML

Abstract

lecting statistics about missing bindings and macros, and other errors. This guides debugging and development efforts, leading to iterative improvements in both the tools and the quality of the converted corpus. The build system thus serves as both a production conversion engine and software test harness. We have now processed the complete arχiv collection through 2006 consisting of more than 400,000 documents (a complete run is a processor-yearsize undertaking), continuously improving our success rate. We are now able to convert more than 90% of these documents to XHTML+MathML. We consider over 60% to be successes, converted with no or minor warnings. While the remaining 30% can also be converted, their quality is doubtful, due to unsupported macros or conversion errors

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 93,642

External links

  • This entry has no external links. Add one.
Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Analytics

Added to PP
2010-12-22

Downloads
40 (#113,921)

6 months
40 (#385,383)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Carlos David
Universidad de La Salle - Santafé de Bogotá