The mesostriatal dopamine system is prominently implicated in model-free reinforcement learning, with fMRI BOLD signals in ventral striatum notably covarying with model-free prediction errors. However, latent learning and devaluation studies show that behavior also shows hallmarks of model-based planning, and the interaction between model-based and model-free values, prediction errors, and preferences is underexplored. We designed a multistep decision task in which model-based and model-free influences on human choice behavior could be distinguished. By showing that choices reflected both influences we could (...) then test the purity of the ventral striatal BOLD signal as a model-free report. Contrary to expectations, the signal reflected both model-free and model-based predictions in proportions matching those that best explained choice behavior. These results challenge the notion of a separate model-free learner and suggest a more integrated computational architecture for high-level human decision-making. (shrink)
We routinely observe others’ choices and use them to guide our own. Whose choices influence us more, and why? Prior work has focused on the effect of perceived similarity between two individuals, such as the degree of overlap in past choices or explicitly recognizable group affiliations. In the real world, however, any dyadic relationship is part of a more complex social structure involving multiple social groups that are not directly observable. Here we suggest that human learners go beyond dyadic similarities (...) in choice behaviors or explicit group memberships; they infer the structure of social influence by grouping individuals based on choices, and they use these groups to decide whose choices to follow. We propose a computational model that formalizes this idea, and we test the model predictions in a series of behavioral experiments. In Experiment 1, we reproduce a well-established finding that people's choices are more likely to be influenced by someone whose past choices are more similar to their own past choices, as predicted by our model as well as dyadic similarity models. In Experiments 2–5, we test a set of unique predictions of our model by looking at cases where the degree of choice overlap between individuals is equated, but their choices indicate a latent group structure. We then apply our model to prior empirical results on infants’ understanding of others’ preferences, presenting an alternative account of developmental changes. Finally, we discuss how our model relates to classical findings in the social influence literature and the theoretical implications of our model. Taken together, our findings demonstrate that structure learning is a powerful framework for explaining the influence of social information on decision making in a variety of contexts. (shrink)
In reinforcement learning, a decision maker searching for the most rewarding option is often faced with the question: What is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: How can I generalize my previous experience with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and we describe an equivalence between the Bayesian model and (...) temporal difference learning algorithms that have been proposed as models of RL in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search efficiency compared to traditional RL algorithms previously applied to human cognition. In two behavioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty. (shrink)