Learning Diphone-Based Segmentation

Cognitive Science 35 (1):119-155 (2011)
  Copy   BIBTEX


This paper reconsiders the diphone-based word segmentation model of Cairns, Shillcock, Chater, and Levy (1997) and Hockema (2006), previously thought to be unlearnable. A statistically principled learning model is developed using Bayes’ theorem and reasonable assumptions about infants’ implicit knowledge. The ability to recover phrase-medial word boundaries is tested using phonetic corpora derived from spontaneous interactions with children and adults. The (unsupervised and semi-supervised) learning models are shown to exhibit several crucial properties. First, only a small amount of language exposure is required to achieve the model’s ceiling performance, equivalent to between 1 day and 1 month of caregiver input. Second, the models are robust to variation, both in the free parameter and the input representation. Finally, both the learning and baseline models exhibit undersegmentation, argued to have significant ramifications for speech processing as a whole



    Upload a copy of this work     Papers currently archived: 84,049

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Which came first: Infants learning language or motherese?Heather Bortfeld - 2004 - Behavioral and Brain Sciences 27 (4):505-506.
Human Semi-Supervised Learning.Bryan R. Gibson, Timothy T. Rogers & Xiaojin Zhu - 2013 - Topics in Cognitive Science 5 (1):132-172.
Bayesian model learning based on predictive entropy.Jukka Corander & Pekka Marttinen - 2006 - Journal of Logic, Language and Information 15 (1-2):5-20.


Added to PP

110 (#130,233)

6 months
1 (#510,366)

Historical graph of downloads
How can I increase my downloads?