Natural language grammar induction using a constituent-context model

Abstract

This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher quality analyses, giving the best published results on the ATIS dataset.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,853

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Analytics

Added to PP
2009-01-28

Downloads
43 (#369,861)

6 months
7 (#430,488)

Historical graph of downloads
How can I increase my downloads?