Abstract
The vocabulary for describing odors in English natural language is not well understood, as prior studies of odor descriptions have often relied on preselected descriptors and odor ratings. Here, we present a data-driven approach that automatically identifies English odor descriptors based on their degree of olfactory association, and derive their semantic organization from their distributions in natural texts, using a distributional-semantic language model. We identify 243 descriptors that are much more strongly associated with olfaction than English words in general. We then derive the semantic organization of these olfactory descriptors, and find that it is captured by four clusters that we name Offensive, Malodorous, Fragrant, and Edible. The semantic space derived from our model primarily differentiates descriptors in terms of pleasantness and edibility along which our four clusters are positioned, and is similar to a space derived from perceptual data. The semantic organization of odor vocabulary can thus be mapped using natural language data (e.g., online text), without the limitations of odor-perceptual data and preselected descriptors. Our method may thus facilitate research on olfaction, a sensory system known to often elude verbal description.