Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

Abstract

A significant portion of the world’s text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one correspondence between LDA’s latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDA’s improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from del.icio.us. Labeled LDA outperforms SVMs by more than 3 to 1 when extracting tag-specific document snippets. As a multi-label text classifier, our model is competitive with a discriminative baseline on a variety of datasets.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,423

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Human Semi-Supervised Learning.Bryan R. Gibson, Timothy T. Rogers & Xiaojin Zhu - 2013 - Topics in Cognitive Science 5 (1):132-172.
Stigma and the Politics of Biomedical Models of Mental Illness.Angela K. Thachuk - 2011 - International Journal of Feminist Approaches to Bioethics 4 (1):140-163.
Tree models and (labeled) categorial grammar.Yde Venema - 1996 - Journal of Logic, Language and Information 5 (3-4):253-277.
So-labeled neo-fregeanism.Mark Crimmins - 1993 - Philosophical Studies 69 (2-3):265 - 279.
Logic of transition systems.Johan Van Benthem & Jan Bergstra - 1994 - Journal of Logic, Language and Information 3 (4):247-283.
Term-labeled categorial type systems.Richard T. Oehrle - 1994 - Linguistics and Philosophy 17 (6):633 - 678.

Analytics

Added to PP
2010-12-22

Downloads
337 (#57,739)

6 months
7 (#418,426)

Historical graph of downloads
How can I increase my downloads?