Stochastic Time‐Series Analyses Highlight the Day‐To‐Day Dynamics of Lexical Frequencies

Cognitive Science 46 (12):e13215 (2022)
  Copy   BIBTEX

Abstract

Standard models in quantitative linguistics assume that word usage follows a fixed frequency distribution, often Zipf's law or a close relative. This view, however, does not capture the near daily variations in topics of conversation, nor the short-term dynamics of language change. In order to understand the dynamics of human language use, we present a corpus of daily word frequency variation scraped from online news sources every 20 min for more than 2 years. We construct a simple time-varying model with a latent state, which is observed via word frequency counts. We use Bayesian techniques to infer the parameters of this model for 20,000 words, allowing us to convert complex word-frequency trajectories into low-dimensional parameters in word usage. By analyzing the inferred parameters of this model, we quantify the relative mobility and drift of words on a day-to-day basis, while accounting for sampling error. We quantify this variation and show evidence against “rich-get-richer” models of word use, which have been previously hypothesized to explain statistical patterns in language.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 93,612

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Analytics

Added to PP
2022-12-21

Downloads
11 (#1,150,279)

6 months
8 (#507,683)

Historical graph of downloads
How can I increase my downloads?