Abstract
Auditory scene analysis is critical for complex auditory processing. We study auditory segregation from the neural network perspective, and develop a framework for primitive auditory scene analysis. The architecture is a laterally coupled two‐dimensional network of relaxation oscillators with a global inhibitor. One dimension represents time and another one represents frequency. We show that this architecture, plus systematic delay lines, can in real time group auditory features into a stream by phase synchrony and segregate different streams by desynchronization. The network demonstrates a set of psychological phenomena regarding primitive auditory scene analysis, including dependency on frequency proximity and the rate of presentation, sequential capturing, and competition among different perceptual organizations. We offer a neurocomputational theory—shifting synchronization theory—for explaining how auditory segregation might be achieved in the brain, and the psychological phenomenon of stream segregation. Possible extensions of the model are discussed.