Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Abstract

This paper presents empirical studies and closely corresponding theoretical models of the performance of a chart parser exhaustively parsing the Penn Treebank with the Treebank’s own CFG grammar. We show how performance is dramatically affected by rule representation and tree transformations, but little by top-down vs. bottom-up strategies. We discuss grammatical saturation, including analysis of the strongly connected components of the phrasal nonterminals in the Treebank, and model how, as sentence length increases, the effective grammar rule size increases as regions of the grammar are unlocked, yielding super-cubic observed time behavior in some configurations.

Download options

PhilArchive



    Upload a copy of this work     Papers currently archived: 72,891

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Analytics

Added to PP
2010-12-22

Downloads
15 (#704,697)

6 months
1 (#386,040)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Daniel Klein
Harvard University

References found in this work

No references found.

Add more references