Why do humans make music? Theories of the evolution of musicality have focused mainly on the value of music for specific adaptive contexts such as mate selection, parental care, coalition signaling, and group cohesion. Synthesizing and extending previous proposals, we argue that social bonding is an overarching function that unifies all of these theories, and that musicality enabled social bonding at larger scales than grooming and other bonding mechanisms available in ancestral primate societies. We combine cross-disciplinary evidence from archeology, anthropology, (...) biology, musicology, psychology, and neuroscience into a unified framework that accounts for the biological and cultural evolution of music. We argue that the evolution of musicality involves gene–culture coevolution, through which proto-musical behaviors that initially arose and spread as cultural inventions had feedback effects on biological evolution because of their impact on social bonding. We emphasize the deep links between production, perception, prediction, and social reward arising from repetition, synchronization, and harmonization of rhythms and pitches, and summarize empirical evidence for these links at the levels of brain networks, physiological mechanisms, and behaviors across cultures and across species. Finally, we address potential criticisms and make testable predictions for future research, including neurobiological bases of musicality and relationships between human music, language, animal song, and other domains. The music and social bonding hypothesis provides the most comprehensive theory to date of the biological and cultural evolution of music. (shrink)
For many years the evolution of language has been seen as a disreputable topic, mired in fanciful “just so stories” about language origins. However, in the last decade a new synthesis of modern linguistics, cognitive neuroscience and neo-Darwinian evolutionary theory has begun to make important contributions to our understanding of the biology and evolution of language. I review some of this recent progress, focusing on the value of the comparative method, which uses data from animal species to draw inferences about (...) language evolution. Discussing speech first, I show how data concerning a wide variety of species, from monkeys to birds, can increase our understanding of the anatomical and neural mechanisms underlying human spoken language, and how bird and whale song provide insights into the ultimate evolutionary function of language. I discuss the “descended larynx” of humans, a peculiar adaptation for speech that has received much attention in the past, which despite earlier claims is not uniquely human. Then I will turn to the neural mechanisms underlying spoken language, pointing out the difficulties animals apparently experience in perceiving hierarchical structure in sounds, and stressing the importance of vocal imitation in the evolution of a spoken language. Turning to ultimate function, I suggest that communication among kin (especially between parents and offspring) played a crucial but neglected role in driving language evolution. Finally, I briefly discuss phylogeny, discussing hypotheses that offer plausible routes to human language from a non-linguistic chimp-like ancestor. I conclude that comparative data from living animals will be key to developing a richer, more interdisciplinary understanding of our most distinctively human trait: language. (shrink)
In this commentary on Berwick and Chomsky's “Why Only Us,” I discuss three key points. I first offer a brief critique of their scholarship, notably their often unjustified dismissal of previous thinking about language evolution. But my main focus concerns two arguments central to the book's thesis: the irrelevance of externalization to language evolution and the discontinuity between human conceptual representations and those of other animals. I argue against both stances, using cognitive data from nonhuman species to show that externalization (...) is not irrelevant to understanding the biology of language, and that many human conceptual structures have clear animal homologs. (shrink)
Historical language change (), like evolution itself, is a fact; and its implications for the biological evolution of the human capacity for language acquisition () have been ably explored by many contemporary theorists. However, Christiansen & Chater's (C&C's) revolutionary call for a replacement of phylogenetic models with glossogenetic cultural models is based on an inadequate understanding of either. The solution to their lies before their eyes, but they mistakenly reject it due to a supposed Gene/;culture co-evolution poses a series of (...) difficult theoretical and empirical problems that will be resolved by subtle thinking, adequate models, and careful cross-disciplinary research, not by oversimplified manifestos. (shrink)
Accepting Bullot & Reber's (B&R's) criteria for art appreciation would confine the study of aesthetics to those works for which historical information is available, mainly posthigh art.correct” artistic understanding is limited to experts with detailed knowledge or education in art, which implies a narrowly elitist conception of aesthetics. Scientific aesthetics must be broadly inclusive.
We compare and contrast the 60 commentaries by 109 authors on the pair of target articles by Mehr et al. and ourselves. The commentators largely reject Mehr et al.'s fundamental definition of music and their attempts to refute our social bonding hypothesis, byproduct hypotheses, and sexual selection hypotheses for the evolution of musicality. Instead, the commentators generally support our more inclusive proposal that social bonding and credible signaling mechanisms complement one another in explaining cooperation within and competition between groups in (...) a coevolutionary framework. We discuss the proposed criticisms and extensions, with a focus on moving beyond adaptation/byproduct dichotomies and toward testing of cross-species, cross-cultural, and other empirical predictions. (shrink)
Hierarchical structures are rapidly and flexibly built up in the domains of human language and music. These domains require a tree-building capacity – “dendrophilia” – to dynamically infer hierarchical structures from sensory input, based on subunits stored in a lexicon. This dynamic process involves a crucial class of abstracta overlooked in the target article.
I concur with Merker and colleague's critiques, suggesting that hypotheses about the evolutionary function of consciousness can help address them. Brains are parallel systems that function to compute possible actions and predict outcomes. I hypothesize that a core function of consciousness per se is the global feedback of information about those actions actually executed, supporting local learning via neuronal updating.
Explaining the transition from a signed to a spoken protolanguage is a major problem for all gestural theories. I suggest that Arbib's improved “beyond the mirror” hypothesis still leaves this core problem unsolved, and that Darwin's model of musical protolanguage provides a more compelling solution. Second, although I support Arbib's analytic theory of language origin, his claim that this transition is purely cultural seems unlikely, given its early, robust development in children.
Sussman and colleagues provide no evidence supporting their claim that the human vocal production system is specialized to produce locus equations with high correlations and linearity. We propose the alternative null hypothesis that these features result from physical and physiological factors common to all mammalian vocal tracts and we recommend caution in assuming that human speech production mechanisms are unique.
Millikan's account of substance concepts fails to do away with features. Her approach simply moves the suite of relevant features into an encapsulated module. The crux of the problem for scientists studying human infants and nonhuman animals is to determine how individuals reidentify objects and events in the world.
A prerequisite for spoken language learning is segmenting continuous speech into words. Amongst many possible cues to identify word boundaries, listeners can use both transitional probabilities between syllables and various prosodic cues. However, the relative importance of these cues remains unclear, and previous experiments have not directly compared the effects of contrasting multiple prosodic cues. We used artificial language learning experiments, where native German speaking participants extracted meaningless trisyllabic “words” from a continuous speech stream, to evaluate these factors. We compared (...) a baseline condition to five test conditions, in which word-final syllables were either followed by a pause, lengthened, shortened, changed to a lower pitch, or changed to a higher pitch. To evaluate robustness and generality we used three tasks varying in difficulty. Overall, pauses and final lengthening were perceived as converging with the statistical cues and facilitated speech segmentation, with pauses helping most. Final-syllable shortening hindered baseline speech segmentation, indicating that when cues conflict, prosodic cues can override statistical cues. Surprisingly, pitch cues had little effect, suggesting that duration may be more relevant for speech segmentation than pitch in our study context. We discuss our findings with regard to the contribution to speech segmentation of language-universal boundary cues vs. language-specific stress patterns. (shrink)
Vocal music and spoken language both have important roles in human communication, but it is unclear why these two different modes of vocal communication exist. Although similar, speech and song differ in certain design features. One interesting difference is in the pitch intonation contour, which consists of discrete tones in song, vs. gliding intonation contours in speech. Here, we investigated whether vocal phrases consisting of discrete pitches or gliding pitches are remembered better, conducting three studies implementing auditory same-different tasks at (...) three levels of difficulty. We tested two hypotheses: that discrete pitch contours aid auditory memory, independent of musical experience, or that the higher everyday experience perceiving and producing speech make speech intonation easier to remember. We used closely matched stimuli, controlling for rhythm and timbre, and we included a stimulus intermediate between song-like and speech-like pitch contours. We also assessed participants' musicality to evaluate experience-dependent effects. We found that song-like vocal phrases are remembered better than speech-like vocal phrases, and that intermediate vocal phrases evoked a similar advantage to song-like vocal phrases. Participants with more musical experience were better in remembering all three types of vocal phrases. The precise roles of absolute and relative pitch perception and the influence of top-down vs. bottom-up processing should be clarified in future studies. However, our results suggest that one potential reason for the emergence of discrete pitch–a feature that characterises music across cultures–might be that it enhances auditory memory. (shrink)