Abstract
Multiarm bandit problems have been used to model the selection of competing scientific theories by boundedly rational agents. In this paper, I define a variable-arm bandit problem, which allows the set of scientific theories to vary over time. I show that Roth-Erev reinforcement learning, which solves multiarm bandit problems in the limit, cannot solve this problem in a reasonable time. However, social learning via preferential attachment combined with individual reinforcement learning which discounts the past, does.