Template Sampling for Leveraging Domain Knowledge in Information Extraction

Abstract

We initially describe a feature-rich discriminative Conditional Random Field (CRF) model for Information Extraction in the workshop announcements domain, which offers good baseline performance in the PASCAL shared task. We then propose a method for leveraging domain knowledge in Information Extraction tasks, scoring candidate document labellings as one-value-per-field templates according to domain feasibility after generating sample labellings from a trained sequence classifier. Our relational models evaluate these templates according to our intuitions about agreement in the domain: workshop acronyms should resemble their names, workshop dates occur after paper submission dates. These methods see a 5% f-score improvement in fields retrieved when sampling labellings from a Maximum-Entropy Markov Model, however we do not observe improvement over a CRF model. We discuss reasons for this, including the problem of recovering all field instances from a best template, and propose future work in adapting such a model to the CRF, a better standalone system.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,202

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

A model theory of induction.Philip N. Johnson‐Laird - 1994 - International Studies in the Philosophy of Science 8 (1):5 – 29.
The Usual Model Construction for NFU Preserves Information.M. Randall Holmes - 2012 - Notre Dame Journal of Formal Logic 53 (4):571-580.
A rational reconstruction of the domain of feature structures.M. Andrew Moshier - 1995 - Journal of Logic, Language and Information 4 (2):111-143.

Analytics

Added to PP
2010-12-22

Downloads
12 (#1,020,711)

6 months
3 (#880,460)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references