Accurate Unlexicalized Parsing


We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F1) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-theart. This result has potential uses beyond establishing a strong lower bound on the maximum possible accuracy of unlexicalized models: an unlexicalized PCFG is much more compact, easier to replicate, and easier to interpret than more complex lexical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic complexity, and easier to optimize.

Download options


    Upload a copy of this work     Papers currently archived: 72,805

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.


Added to PP

29 (#399,372)

6 months
1 (#386,031)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Daniel Klein
Harvard University

References found in this work

No references found.

Add more references

Similar books and articles