A Generative Constituent-Context Model for Improved Grammar Induction

Abstract

We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. Parameter search with EM produces higher quality analyses than previously exhibited by unsupervised systems, giving the best published unsupervised parsing results on the ATIS corpus. Experiments on Penn treebank sentences of comparable length show an even higher F1 of 71% on nontrivial brackets. We compare distributionally induced and actual part-of-speech tags as input data, and examine extensions to the basic model. We discuss errors made by the system, compare the system to previous models, and discuss upper bounds, lower bounds, and stability for this task.

Download options

PhilArchive



    Upload a copy of this work     Papers currently archived: 72,743

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Analytics

Added to PP
2010-12-22

Downloads
24 (#478,682)

6 months
1 (#386,989)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Daniel Klein
Harvard University

References found in this work

No references found.

Add more references

Citations of this work

Grammar Induction by Unification of Type-Logical Lexicons.Sean A. Fulop - 2010 - Journal of Logic, Language and Information 19 (3):353-381.
Constructions at Work or at Rest?Rens Bod - 2009 - Cognitive Linguistics 20 (1).

Add more citations